Mass labels

ABSTRACT

Provided is a set of two or more mass labels, each label in the set comprising a mass marker moiety attached via a cleavable linker to a mass normalisation moiety, the mass marker moiety being fragmentation resistant, wherein the aggregate mass of each label in the set may be the same or different and the mass of the mass marker moiety of each label in the set may be the same or different, and wherein in any group of labels within the set having a mass marker moiety of a common mass each label has an aggregate mass different from all other labels in that group, and wherein in any group of labels within the set having a common aggregate mass each label has a mass marker moiety having a mass different from that of all other mass marker moieties in that group, such that all of the mass labels in the set are distinguishable from each other by mass spectrometry.

[0001] This invention relates to useful compounds for labellinganalytes, particularly biomolecules such as nucleic acids and proteins.Specifically this invention relates to methods of analysis by massspectrometry, using specific mass labels.

[0002] Various methods of labelling molecules of interest are known inthe art, including radioactive atoms, fluorescent dyes, luminescentreagents, electron capture reagents and light absorbing dyes. Each ofthese labelling systems has features which make it suitable for certainapplications and not others. For reasons of safety, interest innon-radioactive labelling systems has lead to the widespread commercialdevelopment of fluorescent labelling schemes particularly for geneticanalysis. Fluorescent labelling schemes permit the labelling of arelatively small number of molecules simultaneously, typically fourlabels can be used simultaneously and possibly up to eight. However thecosts of the detection apparatus and the difficulties of analysing theresultant signals limit the number of labels that can be usedsimultaneously in a fluorescence detection scheme.

[0003] More recently there has been development in the area of massspectrometry as a method of detecting labels that are cleavably attachedto their associated molecule of interest. In many molecular biologyapplications one needs to be able to separate the molecules of interestprior to analysis. Generally, liquid phase separations are performed.Mass spectrometry in recent years has developed a number of interfacesfor liquid phase separations, which make mass spectrometry particularlyeffective as a detection system for these kinds of applications. Untilrecently Liquid Chromatography Mass Spectrometry was used to detectanalyte ions or their fragment ions directly. However, for manyapplications such as nucleic acid analysis, the structure of the analytecan be determined from indirect labelling. This is advantageousparticularly with respect to the use of mass spectrometry becausecomplex biomolecules such as DNA have complex mass spectra and aredetected with relatively poor sensitivity. Indirect detection means thatan associated label molecule can be used to identify the originalanalyte, the label being designed for sensitive detection and having asimple mass spectrum. Simple mass spectra allow multiple labels to beused to analyse a plurality of analytes simultaneously.

[0004] PCT/GB98/00127 describes arrays of nucleic acid probes covalentlyattached to cleavable labels that are detectable by mass spectrometry,which identify the sequence of the covalently linked nucleic acid probe.The labelled probes of this application have the structure Nu-L-M whereNu is a nucleic acid covalently linked to L, a cleavable linker,covalently linked to M, a mass label. Preferred cleavable linkers inthis application cleave within the ion source of the mass spectrometer.Preferred mass labels are substituted poly-aryl ethers. This applicationdiscloses a variety of ionisation methods, and analysis by quadrupolemass analysers, time of flight (TOF) analysers and magnetic sectorinstruments as specific methods of analysing mass labels by massspectrometry.

[0005] PCT/GB94/01675 discloses ligands, and specifically nucleic acids,cleavably linked to mass tag molecules. Preferred cleavable linkers arephoto-cleavable. This application discloses Matrix Assisted LaserDesorption Ionisation (MALDI) TOF mass spectrometry as a specific methodof analysing mass labels by mass spectrometry.

[0006] PCT/US97/22639 discloses releasable non-volatile mass-labelmolecules. In preferred embodiments these labels comprise polymers,typically biopolymers which are cleavably attached to a reactive groupor ligand, i.e. a probe. Preferred cleavable linkers appear to bechemically or enzymatically cleavable. This application discloses MALDITOF mass spectrometry as a specific method of analysing mass labels bymass spectrometry.

[0007] PCT/US97/01070, PCT/US97/01046, and PCT/US97/01304 discloseligands, and specifically nucleic acids, cleavably linked to mass tagmolecules. Preferred cleavable linkers appear to be chemically orphoto-cleavable. These applications disclose a variety of ionisationmethods and analysis by quadrupole mass analysers, TOF analysers andmagnetic sector instruments as specific methods of analysing mass labelsby mass spectrometry.

[0008] The mass spectra generated for an analyte material are verysensitive to contaminants. Essentially, any material introduced into themass spectrometer that can ionise will appear in the mass spectrum. Thismeans that for many analyses it is necessary to carefully purify theanalyte before introducing it into the mass spectrometer. For thepurposes of high throughput systems for indirect analysis of analytesthrough mass labels it would be desirable to avoid any unnecessarysample preparation steps. That is to say it would be desirable to beable to detect labels in a background of contaminating material and becertain that the peak that is detected does in fact correspond to alabel. The prior art does not disclose methods or compositions that canimprove the signal to noise ratio achievable in mass spectrometry baseddetection systems or that can provide confirmation that a mass peak in aspectrum was caused by the presence of a mass label.

[0009] For the purposes of detection of analytes after liquidchromatography or electrophoretic separations, it is desirable that thelabels used minimally interfere with the separation process. If an arrayof such labels are used, it is desirable that the effect of each memberof the array on its associated analyte is the same as every otherlabels. This conflicts to some extent with the intention of mass markingwhich is to generate arrays of labels that are resolvable in the massspectrometer on the basis of their mass. Mass labels should preferablybe resolved by 4 daltons to prevent interference of isotope peaks fromone label with those of another label. This means that to generate 250distinct mass labels would require labels spread over a range of about1000 daltons and probably more, since it is not trivial to generatelarge arrays of labels separated by exactly 4 daltons. This range ofmass will almost certainly result in mass labels that will have adistinct effect on any separation process that precedes detection bymass spectrometry. It also has implications for instrument design, inthat as the mass range over which a mass spectrometer can detect ionsincreases, the cost of the instrument increases.

[0010] It is thus an object of this invention to solve the problemsassociated with the above prior art, and provide mass labels which canbe detected in a background of contamination and whose identity as masslabels can be confirmed. Furthermore it is an object of this inventionto provide arrays of labels which can be resolved in a compressed massrange so that the labels do not interfere as much with separationprocesses and which can be detected easily in a mass spectrometer thatdetects ions over a limited range of mass to charge ratios.

[0011] It is also an object of this invention to provide methods ofanalysing biomolecules which exploit the labels of this invention tomaximise throughput, signal to noise ratios and sensitivity of suchassays, particularly in genetic analysis and more particularly2-dimensional gel electrophoresis which is used to analyse proteins.

[0012] Furthermore the design of the mass labels disclosed below allowsa simplified tandem mass spectrometer to be designed for the purposes ofdetecting mass labels. The first mass analyser need only select alimited number of ions whose mass is relatively low. The second massanalyser need only detect a small number of fragmentation products.

[0013] Accordingly, the present invention provides a set of two or moremass labels, each label in the set comprising a mass marker moietyattached via a cleavable linker to a mass normalisation moiety, the massmarker moiety being fragmentation resistant, wherein the aggregate massof each label in the set may be the same or different and the mass ofthe mass marker moiety of each label in the set may be the same ordifferent, and wherein in any group of labels within the set having amass marker moiety of a common mass each label has an aggregate massdifferent from all other labels in that group, and wherein in any groupof labels within the set having a common aggregate mass each label has amass marker moiety having a mass different from that of all other massmarker moieties in that group, such that all of the mass labels in theset are distinguishable from each other by mass spectrometry.

[0014] The term mass marker moiety used in the present context isintended to refer to a moiety that is to be detected by massspectrometry, whilst the term mass normalisation moiety used in thepresent context is intended to refer to a moiety that is not necessarilyto be detected by mass spectrometry, but is present to ensure that amass label has a desired aggregate mass. The number of labels in the setis not especially limited, provided that the set comprises a pluralityof labels. However, it is preferred if the set comprises two or more,three or more, four or more, or five or more labels.

[0015] The present invention also provides an array of mass labels,comprising two or more sets of mass labels as defined above, wherein theaggregate mass of each of the mass labels in any one set is differentfrom the aggregate mass of each of the mass labels in every other set inthe array.

[0016] Further provided by the invention is a method of analysis, whichmethod comprises detecting an analyte by identifying by massspectrometry a mass label or a combination of mass labels unique to theanalyte, wherein the mass label is a mass label from a set or an arrayof mass labels as defined above.

[0017] The invention will now be described in further detail by way ofexample only, with reference to the accompanying drawings, in which:

[0018]FIG. 1 shows a schematic layout of a triple quadrupole massspectrometer;

[0019]FIG. 2 shows ten fragments comprising five mass normalisationmoieties (M₀-M₄), and five mass marker moieties (X₀-X₄), for forming aset of labels according to the present invention, in which fluorine atomsubstituents are employed as mass adjuster moieties;

[0020]FIG. 3 shows a set of five labels according to the presentinvention, formed from the mass normalisation moieties and mass markermoieties of FIG. 2;

[0021]FIG. 4 shows a set of five mass labels according to the presentinvention in which all the labels have a different mass, but in whichall the mass markers of the set have the same mass;

[0022]FIG. 5 shows an example of labelling an analyte such as anoligonucleotide with a combination of mass labels; such that the masslabel combination has a unique mass spectrum which identifies theanalyte;

[0023]FIG. 6 shows an array of sets of mass labels, each set having thesame mass series modifying group (S) and being distinct from all othersets by virtue of the number of fluorine substituents on the base phenylgroup;

[0024]FIG. 7 shows an array of sets of mass labels, each set having thesame mass series modifying group (S) and being distinct from all othersets by virtue of the number of phenyl ether units in the mass seriesmodifying group;

[0025]FIG. 8 illustrates the “mixing mode” embodiment of the presentinvention, showing 4 of the 8 possible unique mass spectra for allcombinations of three mass labels P, Q and R when present in relativequantities of 0 or 1;

[0026]FIG. 9 illustrates the “mixing mode” embodiment of the presentinvention, showing 8 of the 243 possible unique mass spectra for allcombinations of three mass labels P, Q, R, S and T when present inrelative quantities of 0, 1 or 2 (there are 81 possible spectra if Tremains constant as an internal standard);

[0027]FIG. 10 shows how larger sets of labels can be formed by enlargingthe mass normalisation and mass marker moieties to allow more scope forsubstitution—This set of labels has nine members, and uses fluorine atomsubstituents as mass adjuster moieties—a set of labels having at least 8members such as this is convenient for labelling all 256 4-mers in anarray of oligonucleotides using the mixing mode of the presentinvention;

[0028]FIG. 11 shows mass spectrum 1, which is a complete spectrumcomprising peaks from all ions A⁺, B⁺, C⁺, and D₊;

[0029]FIG. 12 shows mass spectrum 2, which is a spectrum of A⁺ only,produced by selecting for A⁺ ions in a first quadrupole of thespectrometer (Q1);

[0030]FIG. 13 shows mass spectrum 3, which is a spectrum of a first ionA₁ ⁺ (of the same mass/charge ratio as A⁺ and fragmentation products ofA₁ ⁺, P⁺ and Q⁺;

[0031]FIG. 14 shows mass spectrum 4, which is a spectrum of a second ionA₂ ⁺(of the same mass/charge ratio as A⁺) and fragmentation products ofA₂ ⁺, X⁺ and Y⁺;

[0032]FIG. 15 shows mass spectrum 5, which is a spectrum formed byselecting for A⁺ ions when two types of such ions are present, A₁ ⁺ andA₂ ⁺;

[0033]FIG. 16 shows mass spectrum 6, which is a spectrum formed in atriple quadrupole spectrometer by selecting in Q1 for A⁺ ions when twotypes of such ions are present (A₁ ⁺ and A₂ ⁺) inducing dissociation ofthe selected ions by collision in Q2, and selecting for a knowncollision product of A₁ ⁺ (P⁺ in Q3—such a procedure allows resolutionof A₁ ⁺ and A₂ ⁺;

[0034]FIG. 17 shows mass spectrum 7, which is a 2-dimensional spectrumof a set of five mass labels according to the present invention, inwhich a mass MX is selected in Q1 (first dimension) and five distinctmasses X₀, X₁, X₂, X₃, and X₄ are selected in Q3 (second dimension);

[0035]FIG. 18 shows mass spectrum 8, which is a 2-dimensional spectrumof a set of four mass labels according to the present invention, inwhich four distinct masses, M₀X₀, M₁X₀, M₂X₀, and M₃X₀, are selected inQ1 (first dimension) and a single mass M₀ is selected in Q3 (seconddimension);

[0036]FIG. 19 shows mass spectrum 9, which is a 2-dimensional spectrumof a set of mass labels comprising labels formed from all combinationsof M₀-M₃ with X₀-X₃, in which seven distinct masses are selected in Q1(first dimension) and four distinct masses X₀-X₃ are selected in Q3(second dimension);

[0037]FIG. 20 shows a schematic of a typical cleavage process using themass labels of the present invention and cleaving them from theiranalytes thermally, or using electrospray ionisation;

[0038]FIG. 21 shows a schematic of the selection procedures in2-dimensional mass spectrometry using a set of five mass labelsaccording to the present invention.

[0039]FIG. 22 shows deuterated mass labels according to the presentinvention;

[0040]FIG. 23 shows further deuterated mass labels according to thepresent invention; and

[0041]FIG. 24 shows a theoretical spectrum for two samples of a peptidewith the sequence H₂N-gly-leu-ala-ser-glu-COOH, where each sample isattached to one of the labels with the formulae shown in FIG. 23.

[0042] In one preferred embodiment, the present invention provides a setof mass labels as defined above, in which each label in the set has amass marker moiety having a common mass and each label in the set has aunique aggregate mass. An example a set of labels of this first type isgiven in FIG. 4.

[0043] In an alternative, more preferred embodiment, each label in theset has a common aggregate mass and each label in the set has a massmarker moiety of a unique mass. An example of a set of labels of thissecond type is given in FIG. 3.

[0044] The set of labels need not be limited to the two preferredembodiments described above, and may for example comprise labels of bothtypes, provided that all labels are distinguishable by massspectrometry, as outlined above.

[0045] It is preferred that, in a set of labels of the second type, eachmass marker moiety in the set has a common basic structure and each massnormalisation moiety in the set has a common basic structure, and eachmass label in the set comprises one or more mass adjuster moieties, themass adjuster moieties being attached to or situated within the basicstructure of the mass marker moiety and/or the basic structure of themass normalisation moiety. In this embodiment, every mass marker moietyin the set comprises a different number of mass adjuster moieties andevery mass label in the set has the same number of mass adjustermoieties.

[0046] Throughout this description, by common basic structure, it ismeant that two or more moieties share a structure which hassubstantially the same structural skeleton, backbone or core. Thisskeleton or backbone may be for example a phenyl ether moiety. Theskeleton or backbone may comprise substituents pendent from it, oratomic or isotopic replacements within it, without changing the commonbasic structure.

[0047] Typically, a set of mass labels of the second type referred toabove comprises mass labels with the formula:

M(A)_(y)-L-X(A)_(z)

[0048] wherein M is the mass normalisation moiety, X is the mass markermoiety, A is a mass adjuster moiety, L is a cleavable linker, y and zare integers of 0 or greater, and y+z is an integer of 1 or greater.Preferably M is a fragmentation resistant group, L is a linker that issusceptible to fragmentation on collision with another molecule or atomand X is preferably a pre-ionised, fragmentation resistant group. Thesum of the masses of M and X is the same for all members of the set.Preferably M and X have the same basic structure or core structure, thisstructure being modified by the mass adjuster moieties.

[0049] The mass adjuster moiety ensures that the sum of the masses of Mand X in is the same for all mass labels in a set, but ensures that eachX has a distinct (unique) mass.

[0050] A preferred set of mass labels having the above structure is onewherein each of the labels in the set has the following structure:

[0051] wherein R is hydrogen or is a substituted or unsubstitutedaliphatic, aromatic, cyclic or heterocyclic group, L is the cleavablelinker and A is the mass adjuster moiety, each p is the same and is aninteger of 0 or greater, each y′ may be the same or different and is aninteger of 0-4, the sum of all y′ being equal to y, each z′ may be thesame or different and is an integer of 0-4, the sum of all z′ beingequal to z. Preferably R is H, L is an amide bond, p=0, and A is an Fatom.

[0052] In the present context, the substitution pattern on the R groupis not at all limited. The substituent or substituents may comprise anyorganic group and/or one or more atoms from any of groups IIIA, IVA, VA,VIA or VIIA of the Periodic Table, such as a B, Si, N, P, O, or S atomor a halogen atom (e.g. F, Cl, Br or I).

[0053] When the substituent comprises an organic group, the organicgroup may comprise a hydrocarbon group. The hydrocarbon group maycomprise a straight chain, a branched chain or a cyclic group.Independently, the hydrocarbon group may comprise an aliphatic or anaromatic group. Also independently, the hydrocarbon group may comprise asaturated or unsaturated group.

[0054] When the hydrocarbon comprises an unsaturated group, it maycomprise one or more alkene functionalities and/or one or more alkynefunctionalities. When the hydrocarbon comprises a straight or branchedchain group, it may comprise one or more primary, secondary and/ortertiary alkyl groups. When the hydrocarbon comprises a cyclic group itmay comprise an aromatic ring, an aliphatic ring, a heterocyclic group,and/or fused ring derivatives of these groups. The cyclic group may thuscomprise a benzene, naphthalene, anthracene, indene, fluorene, pyridine,quinoline, thiophene, benzothiophene, furan, benzofuran, pyrrole,indole, imidazole, thiazole, and/or an oxazole group, as well asregioisomers of the above groups.

[0055] The number of carbon atoms in the hydrocarbon group is notespecially limited, but generally the hydrocarbon group comprises from1-40 C atoms. The hydrocarbon group may thus be a lower hydrocarbon (1-6C atoms) or a higher hydrocarbon (7 C atoms or more, e.g. 7-40 C atoms).The number of atoms in the ring of the cyclic group is not especiallylimited, but the ring of the cyclic group may comprise from 3-10 atoms,such as 3, 4, 5, 6 or 7 atoms.

[0056] The groups comprising heteroatoms defined above, as well as anyof the other groups defined above, may comprise one or more heteroatomsfrom any of groups IIIA, IVA, VA, VIA or VIIA of the Periodic Table,such as a B, Si, N, P, O, or S atom or a halogen atom (e.g. F, Cl, Br orI). Thus the substituent may comprise one or more of any of the commonfunctional groups in organic chemistry, such as hydroxy groups,carboxylic acid groups, ester groups, ether groups, aldehyde groups,ketone groups, amine groups, amide groups, imine groups, thiol groups,thioether groups, sulphate groups, sulphonic acid groups, and phosphategroups. The substituent may also comprise derivatives of these groups,such as carboxylic acid anhydrydes and carboxylic acid halides.

[0057] In addition, any substituent may comprise a combination of two ormore of the substituents and/or functional groups defined above.

[0058] The arrays of mass labels of the present invention are notparticularly limited, provided that they contain a plurality of sets ofmass labels according to the present invention. It is preferred that thearrays comprise two or more, three or more, four or more, or five ormore sets of mass labels. Preferably each mass label in the array haseither of the following structures:

(S)_(x)-M(A)_(y)-L-X(A)_(z)

M(A)_(y)-(S)_(x)-L-x(A)_(z)

[0059] wherein S is the mass series modifying group, M is the massnormalisation moiety, X is the mass marker moiety, A is the massadjuster moiety, L is the cleavable linker, x is an integer of 0 orgreater, y and z are integers of 0 or greater, and y+z is an integer of1 or greater.

[0060] A preferred array of mass labels of the above type is one inwhich the mass labels have either of the following structures:

[0061] wherein R is hydrogen or is a substituted or unsubstitutedaliphatic, aromatic, cyclic or heterocyclic group, each p is the sameand is an integer of 0 or greater, x is an integer of 0 or greater eachx for any one set being different from the x of every other set in thearray, each y′ may be the same or different and is an integer of 0-4,the sum of all y′ being equal to y, and each z′ may be the same ordifferent and is an integer of 0-4, the sum of all z′ being equal to z.An array of this type is depicted in FIG. 7.

[0062] In an alternative preferred aspect, the array of mass labels maycomprise mass labels having either of the following structures:

S(A*)_(r)-M(A)_(y)-L-X(A)_(z)

M(A)_(y)-S(A*)_(r)L-X(A)_(z)

[0063] wherein S is a mass series modifying group, M is the massnormalisation moiety, X is the mass marker moiety, A is a mass adjustermoiety of the mass marker and mass normalisation moieties, A* may be thesame or different from A and is a mass adjuster moiety of the massseries modifying groups, L is the cleavable linker, r is an integer of 0or greater and is at least 1 for one or more sets of mass labels in thearray, y and z are integers of 0 or greater, and x+y is an integer of 1or greater. Preferably, M is a fragmentation resistant group, L is alinker that is susceptible to fragmentation on collision with anothermolecule or atom and X is preferably a pre-ionised, fragmentationresistant group. S is typically a group such that each member of thearray of sets of labels comprises an S whose mass is separated bypreferably 4 daltons from every other S of every other member of thearray. Thus each different set of mass labels has a distinct (unique)mass.

[0064] A preferred array of mass labels of the above latter type is onein which the mass labels in the array has either of the followingstructures:

[0065] wherein R is hydrogen or is a substituted or unsubstitutedaliphatic, aromatic, cyclic or heterocyclic group, each p is the sameand is an integer of 0 or greater, x is an integer of 0 or greater xbeing the same for all mass labels in the array, each y′ may be the sameor different and is an integer of 0-4, the sum of all y′ being equal toy, each z′ may be the same or different and is an integer of 0-4, thesum of all z′ being equal to z, and each r′ may be the same ordifferent, the sum of all r′ being equal to r. An array of this type isdepicted in FIG. 6.

[0066] In the above sets and arrays of this invention, the common basicstructure of the M, X and S groups is not particularly limited and maycomprise a cyclic and/or a non-cyclic group. The nature of M, X and S isnot particularly limited. However, it is preferred that M and/or X,and/or S comprise as a basic (core) structure, a cyclic group, such asan aryl, a cycloalkyl or a heterocyclic group. These groups may beunsubstituted, but are preferably substituted. M, X and/or S mayrespectively comprise an oligomer or polymer formed from the abovecyclic monomers, where the cyclic monomers are linked by a fragmentationresistant bond or group.

[0067] Aryl ethers, such as a phenyl ether group and their oligomers andpolymers, especially substituted aryl ethers, are preferred common basicstructures for M, X and S.

[0068] The cleavable linker group L is not particularly limited.However, it is preferred that L comprises a group which is cleavable bycollision, and/or is cleavable in a mass spectrometer. Preferably thegroup L comprises an amide bond.

[0069] In a further preferred aspect, this invention provides sets andarrays of mass labels which can be reacted with analyte molecules, themass labels having the form:

Re-L′-label or Re-L′-S-label

[0070] where Re is a reactive functionality or group which allows themass label to be reacted covalently to an appropriate functional groupin an analyte molecule, such as, but not limited to, a nucleotideoligonucleotide, polynucleotide, amino acid, peptide or polypeptide. L′is a linker which may or may not be cleavable, and label is a mass labelfrom any of the sets or arrays defined above. S has the same meaning asdefined above. L′ may be a cleavable linker if desired, such as acleavable linker L, as defined above.

[0071] In preferred embodiments of the above aspects of the invention, Land/or L′ are cleavable within the mass spectrometer and preferablywithin the ion source of the mass spectrometer.

[0072] Linker Groups

[0073] In the discussion above and below reference is made to linkergroups which may be used to connect molecules of interest to the masslabel compounds of this invention. A variety of linkers is known in theart which may be introduced between the mass labels of this inventionand their covalently attached analyte. Some of these linkers may becleavable. Oligo- or poly-ethylene glycols or their derivatives may beused as linkers, such as those disclosed in Maskos, U. & Southern, E. M.Nucleic Acids Research 20: 1679-1684, 1992. Succinic acid based linkersare also widely used, although these are less preferred for applicationsinvolving the labelling of oligonucleotides as they are generally baselabile and are thus incompatible with the base mediated de-protectionsteps used in a number of oligonucleotide synthesisers.

[0074] Propargylic alcohol is a bifunctional linker that provides alinkage that is stable under the conditions of oligonucleotide synthesisand is a preferred linker for use with this invention in relation tooligonucleotide applications. Similarly 6-aminohexanol is a usefulbifunctional reagent to link appropriately funtionalised molecules andis also a preferred linker.

[0075] A variety of known cleavable linker groups may be used inconjunction with the compounds of this invention, such as photocleavablelinkers. Ortho-nitrobenzyl groups are known as photocleavable linkers,particularly 2-nitrobenzyl esters and 2-nitrobenzylamines, which cleaveat the benzylamine bond. For a review on cleavable linkers seeLloyd-Williams et al., Tetrahedron 49, 11065-11133, 1993, which covers avariety of photocleavable and chemically cleavable linkers.

[0076] WO 00/02895 discloses the vinyl sulphone compounds as cleavablelinkers, which are also applicable for use with this invention,particularly in applications involving the labelling of polypeptides,peptides and amino acids. The content of this application isincorporated by reference.

[0077] WO 00/02895 discloses the use of silicon compounds as linkersthat are cleavable by base in the gas phase. These linkers are alsoapplicable for use with this invention, particularly in applicationsinvolving the labelling of oligonucleotides. The content of thisapplication is incorporated by reference.

[0078] In the discussion below, reference is made to reactivefunctionalities, Re, to allow compounds of the invention to be linked toother compounds, whether reporter groups or analyte molecules. A varietyof reactive functionalities may be introduced into the mass labels ofthis invention.

[0079] Table 1 below lists some reactive functionalities that may bereacted with nucleophilic functionalities which are found inbiomolecules to generate a covalent linkage between the two entities.For applications involving synthetic oligonucleotides, primary amines orthiols are often introduced at the termini of the molecules to permitlabelling. Any of the functionalities listed below could be introducedinto the compounds of this invention to permit the mass markers to beattached to a molecule of interest. A reactive functionality can be usedto introduce a further linker groups with a further reactivefunctionality if that is desired. Table 1 is not intended to beexhaustive and the present invention is not limited to the use of onlythe listed functionalities. TABLE 1 Nucleophilic Functionality ReactiveFunctionality Resultant Linking Group —SH —SO₂—CH═CR₂ —S—CR₂—CH₂—SO₂——NH₂ —SO₂—CH═CR₂ —N(CR₂—CH₂—SO₂—)₂ or —NH—CR₂—CH₂—SO₂— —NH₂

—CO—NH— —NH₂

—CO—NH— —NH₂ —NCO —NH—CO—NH— —NH₂ —NCS —NH—CS—NH— —NH₂ —CHO —CH₂—NH——NH₂ —SO₂Cl —SO₂—NH— —NH₂ —CH═CH— —NH—CH₂—CH₂— —OH —OP(NCH(CH₃)₂)₂—OP(═O)(O)O—

[0080] It should be noted that in applications involving labellingoligonucleotides with the mass markers of this invention, some of thereactive functionalities above or their resultant linking groups mighthave to be protected prior to introduction into an oligonucleotidesynthesiser. Preferably unprotected ester, thioether and thioesters,amine and amide bonds are to be avoided, as these are not usually stablein an oligonucleotide synthesiser. A wide variety of protective groupsis known in the art which can be used to protect linkages from unwantedside reactions.

[0081] In the discussion below reference is made to “charge carryingfunctionalities” and solubilising groups. These groups may be introducedinto the mass labels such as in the mass markers of the invention topromote ionisation and solubility. The choice of markers is dependent onwhether positive or negative ion detection is to be used. Table 2 belowlists some functionalities that may be introduced into mass markers topromote either positive or negative ionisation. The table is notintended as an exhaustive list, and the present invention is not limitedto the use of only the listed functionalities. TABLE 2 Positive Ion ModeNegative Ion Mode —NH₂ —SO₃ ⁻ —NR₂ —PO₄ ⁻ —NR₃ ⁺ —PO₃ ⁻

—CO₂ ⁻

—SR₂ ⁺

[0082] WO 00/02893 discloses the use of metal-ion binding moieties suchas crown-ethers or porphyrins for the purpose of improving theionisation of mass markers. These moieties are also be applicable foruse with the mass markers of this invention.

[0083] The components of the mass markers of this invention arepreferably fragmentation resistant so that the site of fragmentation ofthe markers can be controlled by the introduction of a linkage that iseasily broken by Collision Induced Dissociation. Aryl ethers are anexample of a class of fragmentation resistant compounds that may be usedin this invention. These compounds are also chemically inert andthermally stable. WO 99/32501 discusses the use of poly-ethers in massspectrometry in greater detail and the content of this application isincorporated by reference.

[0084] In the past, the general method for the synthesis of aryl etherswas based on the Ullmann coupling of arylbromides with phenols in thepresence of copper powder at about 200° C. (representative reference: H.Stetter, G. Duve, Chemische Berichte 87 (1954) 1699). Milder methods forthe synthesis of aryl ethers have been developed using a different metalcatalyst but the reaction temperature is still between 100 and 120° C.(M. Iyoda, M. Sakaitani, H. Otsuka, M. Oda, Tetrahedron Letters 26(1985) 477). This is a preferred route for the production of poly-ethermass labels. See synthesis of FT77 given in the examples below. Arecently published method provides a most preferred route for thegeneration of poly-ether mass labels as it is carried out under muchmilder conditions than the earlier methods (D. E. Evans, J. L. Katz, T.R. West, Tetrahedron Lett. 39 (1998) 2937).

[0085] The present invention also provides a set of two or more probes,each probe in the set being different and being attached to a uniquemass label or a unique combination of mass labels, from a set or anarray of mass labels as defined as defined above.

[0086] Further provided is an array of probes comprising two or moresets of probes, wherein each probe in any one set is attached to aunique mass label, or a unique combination of mass labels, from a set ofmass labels as defined above, and wherein the probes in any one set areattached to mass labels from the same set of mass labels, and each setof probes is attached to mass labels from unique sets of mass labelsfrom an array of mass labels as defined above.

[0087] In one embodiment, each probe is preferably attached to a uniquecombination of mass labels, each combination being distinguished by thepresence or absence of each mass label in the set of mass labels and/orthe quantity of each mass label attached to the probe. This is termedthe “mixing mode” of the present invention, since the probes may beattached to a mixture of mass labels.

[0088] In the above aspects, the nature of the probe is not particularlylimited. However, preferably each probe comprises a biomolecule. Anybiomolecule can be employed, but the biomolecule is preferably selectedfrom a DNA, an RNA, an oligonucleotide, a nucleic acid base, a peptide,a polypeptide, a protein and an amino acid.

[0089] In one preferred embodiment, this invention provides sets andarrays of mass labelled analytes, such as nucleotides, oligonucleotidesand polynucleotides, of the form:

Analyte-L′-label or Analyte-L′-S-label

[0090] Wherein L′ and S are as defined above, and label is a mass labelfrom any of the sets and arrays defined above.

[0091] In the above aspect, the nature of the analyte is notparticularly limited. However, preferably each analyte comprises abiomolecule. Any biomolecule can be employed, but the biomolecule ispreferably selected from a DNA, an RNA, an oligonucleotide, a nucleicacid base, a peptide, a polypeptide, a protein and an amino acid.

[0092] In one embodiment, each analyte is preferably attached to aunique combination of mass labels, each combination being distinguishedby the presence or absence of each mass label in the set of mass labelsand/or the quantity of each mass label attached to the probe. Asmentioned above, this is termed the “mixing mode” of the presentinvention, since the probes may be attached to a mixture of mass labels.

[0093] As mentioned above, the present invention provides a method ofanalysis, which method comprises detecting an analyte by identifying bymass spectrometry a mass label or a combination of mass labels unique tothe analyte, wherein the mass label is a mass label from a set or anarray of mass labels as defined above. The type of method is notparticularly limited, provided that the method benefits from the use ofthe mass labels of the present invention to identify an analyte. Themethod may be, for example, a method of sequencing nucleic acid or amethod of profiling the expression of one or more genes by detectingquantities of protein in a sample. The method is especiallyadvantageous, since it can be used to readily analyse a plurality ofanalytes simultaneously. However, the method also has advantages foranalysing single analytes individually, since using the present masslabels, mass spectra which are cleaner than conventional spectra areproduced, making the method accurate and sensitive.

[0094] In a further preferred embodiment, the present invention providesa method which method comprises:

[0095] (a) contacting one or more analytes with a set of probes, or anarray of probes, each probe in the set or array being specific to atleast one analyte, wherein the probes are as defined above,

[0096] (b) identifying an analyte, by detecting the probe specific tothat analyte.

[0097] In this embodiment it is preferred that the mass label is cleavedfrom the probe prior to detecting the mass label by mass spectrometry.

[0098] The nature of the methods of this particular embodiment is notespecially limited. However, it is preferred that the method comprisescontacting one or more nucleic acids with a set of hybridisation probes.The set of hybridisation probes typically comprises a set of up to 2564-mers, each probe in the set having a different combination of nucleicacid bases. This method may be suitable for identifying the presence oftarget nucleic acids, or alternatively can be used in a stepwise methodof primer extension sequencing of one or more nucleic acid templates.

[0099] The mass labels of the present invention are particularlysuitable for use in methods of 2-dimensional analysis, primarily due tothe large number of labels that can be simultaneously distinguished. Thelabels may thus be used in a method of 2-dimensional gelelectrophoresis, or in a method of 2-dimensional mass spectrometry.

[0100] Thus, in one aspect the present invention provides a method of2-dimensional mass spectrometric analysis, which method comprises;

[0101] (a) providing one or more analytes, each analyte being labelledwith a mass label or a combination of mass labels unique to thatanalyte, wherein the mass labels are from a set or array of mass labelsas defined above; e

[0102] (b) cleaving the mass labels from the analytes;

[0103] (c) detecting the mass labels;

[0104] (d) dissociating the mass labels in the mass spectrometer, torelease the mass marker moieties from the mass normalisation moieties;

[0105] (e) detecting the mass marker moieties; and

[0106] (f) identifying the analytes on the basis of the mass spectrum ofthe mass labels in the first dimension and the mass spectrum of the massmarker moieties in the second dimension.

[0107] In this method, preferably in step (c) mass labels of a chosenmass or a chosen range of masses are selected for detection. It is alsopreferred that in step (e) mass marker moieties having a specific massor a specific range of masses are selected for detection.

[0108] In another aspect, the present invention provides a method ofanalysis, which method comprises:

[0109] (a) subjecting a mixture of labelled analytes to a firstseparation treatment on the basis of a first property of the analytes;

[0110] (b) subjecting the resulting separated analytes to a secondseparation treatment on the basis of a second property of the analytes;and

[0111] (c) detecting an analyte by detecting its label; wherein theanalytes are labelled with a mass label from a set or an array of masslabels as defined above.

[0112] The property of the analytes is not particularly limited.However, in this embodiment in step (a) and/or step (b) the analytes arepreferably separated according to their length or mass. It is furtherpreferred that in step (a) and/or step (b) the analytes are separatedaccording to their iso-electric point. Typically, the analytes compriseone or more proteins, polypeptides, peptides, amino acids or nucleicacids, or fragments thereof. It is particularly preferred that gelelectrophoresis is employed in each of the separation steps. In thisembodiment, the method is a method of 2-dimensional gel electrophoresis.

[0113] In a further aspect, the present invention provides a method forcharacterising nucleic acid, which comprises:

[0114] (a) providing a population of nucleic acid fragments, eachfragment having cleavably attached thereto a mass label from a set or anarray of mass labels as defined above for identifying a feature of thatfragment;

[0115] (b) separating the fragments on the basis of their length;

[0116] (c) cleaving each fragment to release its mass label; and

[0117] (d) determining each mass label by mass spectroscopy to relatethe feature of each fragment to the length of the fragment.

[0118] Typically, the method of this aspect of the invention is used forcharacterising cDNA. Preferably, this method comprises:

[0119] (a) exposing a sample comprising a population of one or morecDNAs or fragments thereof to a cleavage agent which recognises apredetermined sequence and cuts at a reference site at a knowndisplacement from the predetermined sequence proximal to an end of eachcDNA or fragment thereof so as to generate a population of terminalfragments;

[0120] (b) ligating to each reference site an adaptor oligonucleotidewhich comprises a recognition site for a sampling cleavage agent;

[0121] (c) exposing the population of terminal fragments to a samplingcleavage agent which binds to the recognition site and cuts at asampling site of known displacement from the recognition site so as togenerate in each terminal fragment a sticky end sequence of apredetermined length of up to 6 bases, and of unknown sequence;

[0122] (d) separating the population of terminal fragments intosub-populations according to sequence length; and

[0123] (e) determining each sticky end sequence by:

[0124] (i) probing with an array of labelled hybridisation probes, thearray containing all possible base sequences of the predeterminedlength;

[0125] (ii) ligating those probes which hybridised to the sticky endsequences and

[0126] (iii) determining which probes are ligated by identification andpreferably quantification of the labels;

[0127] wherein the labels are mass labels from a set or an array asdefined above.

[0128] In this method, the population of terminal fragments ispreferably separated by capillary electrophoresis, HPLC or gelelectrophoresis.

[0129] In a still further aspect of the present invention, there isprovided a method for characterising nucleic acid, which methodcomprises generating Sanger ladder nucleic acid fragments from one ormore nucleic acid templates, in the presence of at least one labelledterminating base, and identifying the length of the fragment, and theterminating base of the fragment, wherein the label is specific to theterminating base and is a mass label from a set or an array as definedabove.

[0130] In this aspect of the invention, it is preferred that all fourterminating bases are present in the same reaction zone. The methodtypically comprises generating Sanger ladder nucleic acid fragments froma plurality of nucleic acid templates present in the same reaction zone,and for each nucleic acid fragment produced identifying the length ofthe fragment, the identity of the template from which the fragment isderived and the terminating base of the fragment, wherein prior togenerating the fragments, a labelled primer nucleotide oroligonucleotide is hybridised to each template, the label on each primerbeing specific to the template to which that primer hybridises to allowidentification of the template. The type of label identifying thetemplate is not particularly limited. However, it is preferred that thelabel identifying the template is a mass label from a set or an array asdefined in above.

[0131] A further aspect of the method of the present invention providesa method for sequencing nucleic acid, which method comprises:

[0132] (a) obtaining a target nucleic acid population comprising one ormore single-stranded DNAs to be sequenced, each of which is present in aunique amount and bears a primer to provide a double-stranded portion ofthe nucleic acid for ligation thereto;

[0133] (b) contacting the nucleic acid population with an array ofhybridisation probes, each probe comprising a label cleavably attachedto a known base sequence of predetermined length, the array containingall possible base sequences of that predetermined length and the basesequences being incapable of ligation to each other, wherein thecontacting is carried out in the presence of ligase under conditions toligate to the double-stranded portion of each nucleic acid the probebearing the base sequence complementary to the single-stranded nucleicacid adjacent the double-stranded portion thereby to form an extendeddouble-stranded portion which is incapable of ligation to furtherprobes; and

[0134] (c) removing all unligated probes; followed by the steps of:

[0135] (d) cleaving the ligated probes to release each label;

[0136] (e) recording the quantity of each label; and

[0137] (f) activating the extended double-stranded portion to enableligation thereto; wherein

[0138] (g) steps (b) to (f) are repeated in a cycle for a sufficientnumber of times to determine the sequence of the or each single-strandednucleic acid by determining the sequence of release of each label,

[0139] wherein the labels of the hybridisation probes are each from aset or an array as defined above.

[0140] In this aspect of the invention, it is preferred that thehybridisation probes are a set of 256 4-mers, each probe in the sethaving a different combination of nucleic acid bases.

[0141] As already mentioned, it is preferred in all of the above aspectsof the present methods that two or more analytes are detected bysimultaneously identifying their mass labels or combinations of masslabels by mass spectrometry.

[0142] The mixing mode of the present invention may be applied to all ofthe above methods. In this embodiment, each analyte is identified by aunique combination of mass labels from a set or array of mass labels,each combination being distinguished by the presence and absence of eachmass label in the set or array and/or the quantity of each mass label.

[0143] If the method is applied to two or more analytes simultaneously,in some aspects it is preferred that the analytes are separatedaccording to their mass, prior to detecting the mass label by massspectrometry. Preferably, the separation step is a chromatographic step,such as liquid chromatography or gel electrophoresis. The present labelsof type 2 are particularly advantageous in these embodiments, since theaggregate mass of all labels in the set is the same, thus during achromatographic separation step, the mobility of all analytes is equallyaffected by the labels.

[0144] Typically, in the present methods, the mass spectrometer employedto detect the mass label comprises one or more mass analysers, whichmass analysers are capable of allowing ions of a particular mass, orrange of masses, to pass through for detection and/or are capable ofcausing ions to dissociate. Preferably ions of a particular mass orrange of masses specific to one or more known mass labels are selectedusing the mass analyser, the selected ions are dissociated, and thedissociation products are detected to identify ion patterns indicativeof the selected mass labels. In particularly preferred methods, the massspectrometer comprises three quadrupole mass analysers. In thisembodiment, generally a first mass analyser is used to select ions of aparticular mass or mass range, a second mass analyser is used todissociate the selected ions, and a third mass analyser is used todetect resulting ions.

[0145] A preferred embodiment of the above methods provides a method ofanalysing mass labelled analyte molecules, comprising the steps of:

[0146] 1. Cleaving the mass label from its associated molecule ofinterest.

[0147] 2. Ionising the cleaved mass label.

[0148] 3. Selecting ions of a predetermined mass to charge ratiocorresponding to the mass to charge ratio of the preferred ions of knownmass labels in a mass analyser.

[0149] 4. Inducing dissociation of these selected ions by collision.

[0150] 5. Detecting the collision products to identify collision productions that are indicative of the selected mass labels.

[0151] It is preferred that the process of cleaving the mass label fromits associated nucleic acid takes place within a mass spectrometer,preferably within the ion source. It is also preferred that the masslabels are pre-ionised. In this embodiment the labels need only betransferred from a liquid or solid phase into the gas phase (if the masslabels are in a liquid or solid phase). Typically, the step of ionisingthe mass label results from cleavage of the mass label within the ionsource of the mass spectrometer.

[0152] Preferably, the third step of selecting the ions of apredetermined mass to charge ratio is performed in the first massanalyser of a serial instrument. The selected ions are then channelledinto a separate collision cell where they are collided with a gas or asolid surface according to the above fourth step. The collision productsare then channelled into a further mass analyser of a serial instrumentto detect collision products according to the above fifth step. Typicalserial instruments for use in the present invention include triplequadrupole mass spectrometers, tandem sector instruments and quadrupoletime of flight mass spectrometers.

[0153] It is further preferred that the above third step of selectingthe ions of a predetermined mass to charge ratio, the fourth step ofcolliding the selected ions with a gas and the fifth step of detectingthe collision products are performed in the same zone of the massspectrometer. This may, for example, be effected in ion trap massanalysers and Fourier Transform Ion Cyclotron Resonance massspectrometers.

[0154] In a further preferred embodiment, the invention provides amethod of analysing mass labelled analyte molecules, comprising thesteps of:

[0155] 1. Cleaving the mass label from its associated analyte molecule.

[0156] 2. Ionising the cleaved mass label.

[0157] 3. Selecting ions of a predetermined mass to charge ratiocorresponding to the mass to charge ratio of the preferred ions of knownmass labels in a mass analyser.

[0158] 4. Inducing dissociation of these selected ions by collision.

[0159] 5. Detecting more than one of the collision products to identifycollision product ion patterns that are indicative of the selected masslabels which in turn identify the labelled nucleic acid.

[0160] In preferred aspects of this embodiment of this invention, theprocess of cleaving the mass label from its associated nucleic acidtakes place within a mass spectrometer, preferably within the ionsource.

[0161] In certain preferred aspects of this embodiment, the mass labelsare pre-ionised and need only be transferred from a liquid or solidphase into the gas phase (if the mass labels are in a liquid or solidphase).

[0162] In other preferred aspects, the step of ionising the mass labelresults from cleavage of the mass label within the ion source of themass spectrometer.

[0163] In certain aspects, the third step of selecting the ions of apredetermined mass to charge ratio is performed in the first massanalyser of a serial instrument. The selected ions are then channelledinto a separate collision cell where they are collided with a gas or asolid surface according to the above fourth step. The collision productsare then channelled into a further mass analyser of a serial instrumentto detect collision products according to the above fifth step. Typicalserial instruments include triple quadrupole mass spectrometers, tandemsector instruments and quadrupole time of flight mass spectrometers.

[0164] In other preferred aspects, the third step of selecting the ionsof a predetermined mass to charge ratio, the fourth step of collidingthe selected ions with a gas and the fifth step of detecting thecollision products are performed in the same zone of the massspectrometer.

[0165] This may effected in ion trap mass analysers and FourierTransform Ion Cyclotron Resonance mass spectrometers, for example.

[0166] Tandem Mass Spectrometry

[0167] At the expense of some loss in sensitivity, great gains inselectivity can be gained through use of tandem mass spectrometry(MS/MS) to detect the mass labels of the present invention. For thepurposes of illustrating the invention some discussion is now providedregarding tandem mass spectrometry, exemplified here by reference to thetriple quadrupole mass spectrometer. The triple quad allows easyillustration of the principle of MS/MS.

[0168] The quadrupole mass analyser is essentially a mass filter whichcan at any moment be set to allow ions of only a particular mass tocharge ratio to pass through. A quadrupole comprises 4 parallel rodshaped electrodes which form a channel. A direct current potentialsuperimposed by a sinusoidal radio frequency potential is applied to therod electrodes. Ions entering into the channel formed by the parallelrods follow complex trajectories and for a particular DC potential andradio frequency potential, only ions with a predetermined mass to chargeratio will have a stable trajectory which will lead them through thechannel. By changing the applied potentials the quadrupole can be madeto scan across a full range of mass to charge ratios up to about 4000.

[0169] A triple quad (Q) layout is shown in FIG. 1. Three separatequadrupole mass analysers are linked in series. The first quadrupole isreferred to hereafter as Q1, similarly the second will be referred to asQ2 and the third as Q3. Quadrupoles Q1 and Q3 are typically used inscanning modes. The speed of scanning is very high. Alternatively, Q1 orQ3 can be used as “gates”, which allow through only selected ions.Quadrupole Q2 is used in a non-scanning mode, in which it acts as an ionfocusing device. All ions pass through Q2 when there is a high vacuum.When a gas is introduced into Q2, incoming ions collide with gas andmany of the ions gain sufficient energy to fragment. This is “CollisionInduced Dissociation” (CID).

[0170] Consider one particular use of the triple quad. Suppose ions areproduced in the ion source (A⁺ B⁺, C⁺, D⁺, etc.). If all of these ionsare allowed through Q1, with Q2 and Q3 operating in a scanning mode,then a full mass spectrum is generated (FIG. 11—Mass Spectrum 1). MassSpectrum 1 shows a spectrum comprising the molecular ions A⁺ through toD⁺ and assorted fragment ions.

[0171] Now suppose Q1 is set to pass only A⁺ ions and Q2 is at lowpressure. The A⁺ ions pass through Q2 and Q3 and are detected (FIG.12—Mass Spectrum 2). The new mass spectrum is now “cleaned”, of theother ions (B⁺ C⁺, etc.) having been rejected by Q1. Multiple analytescan be detected from the same sample introduced into the massspectrometer by setting Q1 to scan over a limited series of masscorresponding to particular ion species of the analytes of interest.This is termed “Selective Ion Monitoring”.

[0172] A triple quadrupole can be used to gain further selectivity,though. It is possible that A⁺ ions may come from several sources (e.g.several ions could have the same mass to charge ratio of 100 but havedifferent compositions such as C₇H₁₆, C₆H₁₂O, C₅H₈O etc.). Suppose thereare two compositions of A⁺ ions (A_(l) ⁺, A₂ ⁺) both of the same nominalmass (FIGS. 13 and 14—Mass Spectra 3 and 4). If A⁺ ions are selected inQ1 and CID is carried out in Q2, a scan of Q3 will give the spectrumshown in FIG. 15—Mass Spectrum 5. This is a “mixed” spectrum.

[0173] Suppose it is known that ions P⁺, Q₊ (or even just P⁺) canunambiguously reveal that A₁ ⁺ is present. That is to say, thefragmentation (reaction) A₁ ⁺→P⁺+Q⁺ is known to occur. Instead ofscanning all ions in Q3, it is set to detect only P⁺ ions. Thus, afterleaving the ion source, ions A⁺ B⁺ . . . are reduced to just A⁺ (=A₁ ⁺,A₂ ⁺) ions going into Q2.

[0174] After CID, only fragment ions P⁺ are selected and these arecharacteristic of only A₁ ⁺. This is said to be “Single or SelectiveReaction Monitoring”, which is highly selective. In a more generalisedsense, the full spectrum of ions entering Q1 (FIG. 11—Mass Spectrum 1)is reduced in Q3 to P⁺ (FIG. 16—Mass Spectrum 6) and these ions areknown to relate only to A₁ ⁺.

[0175] In some of the ensuing discussions, the examples refer to the useof the mass labels of this invention to identify nucleotides oroligonucleotides. It is equally possible that the labels of thisinvention can be used with proteins or peptides or other analytes andoligonucleotides are mentioned for the purpose of example. For thepurposes of analysing oligonucleotides it is assumed that the masslabels are attached to the oligonucleotide covalently via a cleavablelinker. The linker may be cleavable by a variety of mechanisms,including thermal cleavage, chemical cleavage, cone voltage cleavage orphoto-cleavage. In the following discussion of the behaviour of masslabels it is assumed that the labels have been cleaved from theirassociated nucleic acids during or prior to ionisation. Preferredcleavable linkers and their methods of use are disclosed in GB patentapplications GB 9815163.2 and GB 9815164.0. The preferred cleavageprocess is represented schematically in FIG. 20.

[0176] According to the first aspect of this invention the principle ofSelected Ion Monitoring (SIM) coupled to Selected Reaction Monitoring(SRM) can be applied to mass marking techniques giving a 2-dimensionaldetection process. If A₁ ⁺ was an ion from a mass label and therefore ofknown composition and fragmentation pattern then no matter how many ionswere produced in the ionisation step, the mass label could be identifiedwithout there being any interference from other ions by gating A⁺ ionsin the first quadrupole of a triple quad and then detecting A₁ ⁺fragmentation products, i.e. by gating only P⁺ ions in the thirdquadrupole of a triple quadrupole. It would not matter which M/Z rangewas examined and it is no longer necessary to find “clean” windows inthe mass spectrum.

[0177] As mentioned above, one aspect of this invention provides masslabels which can be represented schematically by the formula M-L-X. Asan example A₁ ⁺ could be the molecular ion for the label shown below:

[0178] M is thus a benzyl group, L is an amide bond and X is a pyridylgroup. The amide bond linking the benzyl ring to the pyridyl ring isparticularly susceptible to cleavage by collision. Thus, on collision,A₁ ⁺ produces the fragment ion below:

[0179] and this would represent P⁺. Thus, detection of P⁺ means A₁ ⁺ ispresent and that one of the labels is present. The label has beenselectively identified from all other ions and this effectivelyeliminates “background” contamination. This means that labelled-analytesdo not need to be exhaustively purified and that the labels do not needto be cleaved and separated from the analyte outside the massspectrometer. This principle can be generalised to provide a usefulclass of compounds for use as mass labels, all of which have a generalstructure M-L-X where M is connected to X via a scissile bond L, such asan amide bond and X is the ion that is detected by SRM. Thus X isanalogous to the cleavage product shown above and referred to as P⁺.

[0180] According to one aspect of this invention the mass labelstructure illustrated above can be generalised to provide a useful setof mass labels all with the same mass but which are still easilyresolved by SRM. Let M₀, M₁ . . . M₄ and X₀, X₁ . . . X₄ be isotopicforms of the halves of M-L-X where L is an amide bond linking M and X.The example above can be used again. If this structure is substitutedwith fluorine, the components shown in FIG. 2 can be generated. Theselabel components can be combined to form a mass label MX (ignoring thecleavable bond L for the moment) as follows:

M₀X₄; M₁X₃; M₂X₂; M₃X₁;M₄X₀

[0181] These five substances have exactly the same mass (FIG. 3). Thus,if a mass label was selected in Q1 of a triple quadrupole, only ions ofmass=M_(m)X_(n) (m=0-4, n=4-0) would be selected. Q1 could be set to“look” for only MX ions. If CID is effected in Q2, then Q3 could be setto pass only ions X₀, X₁, X₂, X₃ and X₄ as shown in FIG. 21.

[0182] Therefore, all at the same mass, there would be 5 mass labelsselected in Q1 and the cleavage reactions shown below can be identifiedin Q3. If mass 139 were detected in Q3, it must have come from M₃X₁ andso on.

M₀X₄ ⁺→X₄ ⁺ (m/z 193)

M₁X₃ ⁺→X₃ ⁺ (m/z 175)

M₂X₂ ⁺→X₂ ⁺ (m/z 157)

M₃X₁ ⁺→X₁ ⁺ (m/z 139)

M₀X₄ ⁺→X₀ ⁺ (m/z 121)

[0183] The selection process of this method can be visualised as atwo-dimensional mass spectrum shown in FIG. 17—mass spectrum 7.

[0184] In an alternative approach, a different set of mass labels can besynthesised. In this mode of analysis SRM is combined with “Selected IonMonitoring” (SIM). In the SIM mode of analysis, the first quadrupole(Q1) selectively scans over predetermined masses gating only ions withthe predetermined masses.

[0185] Considering M₀, M₁, M₂, M₃, M₄ and X₀ from FIGS. 2 and 3 again,these label components can be combined to give 5 labels with differentmasses, M₀X₀, M₁X₀, M₂X₀, M₃X₀ and M₄X₀. Now suppose Q1 of a triplequadrupole is set to select these 5 masses, then Q3 need only be set todetect 1 mass (X₀) as in FIG. 4.

[0186] Thus, the mass spectrometer identifies only 1 fixed ion (X₀).Since X₀ must come from M₀X₀, M₁X₀, M₂X₀, M₃X₀, M₄X₀ only and it isknown when these have been selected in Q1 then this provides analternative mode of mass marking. Five different analytes can now beidentified by one of five specific “single reactions”

M₀X₀→X₀

M₁X₀→X₀

M₂X₀→X₀

M₃X₀→X₀

M₄X₀→X₀

[0187] This generates a different 2-dimensional mass spectrum shown inFIG. 18—mass spectrum 8.

[0188] The two approaches above can be combined. Suppose M₀, M₁, M₂, M₃are chosen to represent the first base of a dinucleotide. The secondbase is characterised by X₀, X₁, X₂, X₃ to give 16 different mass labelsas shown in Table 3 below: TABLE 3 Dinucleotide AA AC AG AT Mass LabelM₀X₃ M₀X₂ M₀X₁ M₀X₀ Dinucleotide CC CA CG CT Mass Label M₁X₂ M₁X₃ M₁X₁M₁X₀ Dinucleotide GG GA GC GT Mass Label M₂X₁ M₂X₃ M₂X₂ M₂X₀Dinucleotide TT TA TC TG Mass Label M₃X₀ M₃X₃ M₃X₂ M₃X₁

[0189] Each mass label will have one of 7 different masses which can beselected in the first mass analyser of a tandem instrument. Thecollision products identified in the second mass analyser will identifythe dimer. Thus with 8 mass label components, it is possible to generate16 mass labels. The full mass 2-dimensional mass spectrum for all ofthese labels is shown in FIG. 19—mass spectrum 9. Similarly, if 256 masslabels are required, two sets of 16 components, i.e. M₀ to M₁₅ and X₀ toX₁₅, would generate sufficient labels, where each label would have oneof 31 different masses.

[0190] According to another aspect of this invention, it is alsopossible to generate arrays of sets of mass labels using mass seriesmodifying groups. According to this aspect of the invention a set oflabels, where each label in the set has the same mass but can beresolved by SRM, can be expanded into an additional set of labels bylinking each member of the set to a mass series modifying group whichwill shift the mass of each member of the set by a pre-determined amountthus generating a second set of labels whose total mass is differentfrom the first set. Thus two distinct sets of mass label ions would begated by SIM in the first quadrupole of a mass analyser and thecollision products would then be analysed in the third quad bymonitoring the same fragment species for both sets of labels. Clearly asmany different sets of labels as can be comfortably analysed in a massspectrometer can be generated by using different mass series modifyinggroups.

[0191] Mass series modifying (S) groups are preferably fragmentationresistant groups such that each S group, when linked to each member of aset of labels, generates a new set of labels that is clearly resolvablefrom every other in an array of such labels. In this context resolvablemeans that each set of labels in the array is preferably separated fromevery other set by approximately 4 daltons at least. This is to ensurethat isotope peaks from one label do not overlap in the mass spectrumwith those of another label. In preferred embodiments of this aspect ofthe invention, the S groups are substituted or unsubstituted cyclicgroups, such as aryl groups, cycloalkyl groups and heterocyclic groups,preferably linked to the members of a set of SRM resolvable mass markinggroups by an ether linkage. Each set in the array may have the same Sgroup, but having a different level of substitution, to ensure that eachset is distinct from all other sets. An example of an array of suchlabels is shown in FIG. 6. In this array, F atoms are used assubstituents (adjuster moieties), but other substituents such as methylgroups could be employed. It should be clear that an array of suchlabels will have very similar effects on the mobility of any associatedanalyte molecules.

[0192] Additional sets of labels could be added to such an array usingmethyl substituted phenyl groups and also phenyl groups substituted withboth methyl and fluoro groups. Methyl groups differ in mass from fluorogroups by just less than 4 daltons and so a significant array of labelscould be generated whose effect on the mobility of associated analyteswould be minimal.

[0193] In other preferred embodiments of this aspect of this invention,the S groups are oligomers or polymers of cyclic groups such as arylgroups, cycloalkyl groups and heterocyclic groups, which may also besubstituted. Specifically, preferred S groups are poly-aryl ethers. Anexample of such an array is shown in FIG. 7.

[0194] According to a further aspect of this invention the principlesdescribed above can be taken further, by labelling analytes with adistinct combination of the mass labels of the present invention. Asmentioned above, this embodiment is termed mixing mode labelling. Whenany individual analyte of a large number must be identified, for examplein combinatorial chemistry, a mixture of labels, e.g. M₀X₃, M₁X₂, M₂X₁,M₃X₀ is chosen. The mixture is attached to an analyte, such that aparticular quantity of each label is present. For instance,aM₀X₃+bM₁X₂+cM₂X₁+dM₃X₀ where a=b=c=d=1 (FIG. 5). If equal parts of fourmass labels (a=b=c=d=0.25) are coupled to an analyte in the samereaction, the chemical joining reaction would not discriminate betweenthem. When an oligonucleotide is labelled, the oligo is mass marked withmore than one label per nucleotide or oligonucleotide.

[0195] Consider three mass labels of the form shown below in Table 4:TABLE 4 Collision Product Mass Total (mass marker Name Structure Massmoiety) P

199 109 Q

199 108 R

199 107

[0196] where “*” can represent ²H or ¹³C isotopes at the positionmarked. It should be clear that different substituents can be used suchas fluorine or methyl groups for example. One mixing mode is such asthat shown in FIG. 8. Eight distinct patterns can be generated by acombination of the presence or absence of a distinct labels in a mixturecoupled to an analyte molecule.

[0197] Consider a different sort of pattern where the ratios of each offive labels are varied when they are coupled to their associated analyteas shown in Table 5 below: TABLE 5 P Q R S T 2 2 2 2 2 2 2 2 1 2 2 2 2 02 2 2 1 2 2 . . . . . . . . . . . . . . . 0 0 1 0 2 0 0 0 2 2 0 0 0 1 20 0 0 0 2

[0198] With 4 mass labels, P, Q, R and S, which can be present at 3different ratios, i.e. none, 1 or 2, there are effectively 3 differententities for each label which means that there are a possible 81different mass spectral patterns that can be generated. It is preferablethat there is also one if these labels whose ratio to the othercomponents remains constant (T), to act as an internal label againstwhich the mass spectrometer data system can compare the relative ratiosof P, Q, R and S. This means that with a mixture of 5 labels all 64combinations of natural nucleotides in a 3-mer oligonucleotide could beidentified.

[0199] In the above example P, Q, R, S and T can be labels of the formshown in FIG. 3, thus the five labels have the same mass and can begated from background contaminants in the first quadrupole of a triplequadruple or a Q-TOF instrument, for example. The fragmentation patternsformed as a result of collision with a bath gas in the second quadrupoleof a triple quadrupole and detection in the third quadrupole are shownin FIG. 9.

[0200] It may be seen that the principle can be extended. In someaspects of the present invention it is desirable to label the 256possible 4-mers, using the above strategy. It is necessary to generate 7different labels which can be mixed in all of the possible combinationsof ratios shown above. Alternatively, if the labels are of the formshown in FIG. 6 or 7, then 4 sets of 81 codings of the sort shown in theexample above can be generated by using the 4 different mass seriesmodifying groups to generate different sets of 5 labels as shown in FIG.6. This generates sufficient labels to encode all possible 256 4s.

[0201] The principle of this aspect of the invention can be extendedstill further. Consider a library of DNA 4-mers comprising all 256possible combinations of the natural nucleotides. Each 4-mer in theseries can be represented as a number from 1 to 256, i.e. AAAA would be1, and AAAC would be 2 through to TTTT which would be 256.

[0202] The numbers 1 to 256 can be represented in a binary form forexample in the way numbers could be represented in a memory register ofa computer. In a register there is a series of switches which representthe numbers 2⁸, 2⁷, 2⁶, 2⁵, 2⁴, 2³, 2², 2¹ and 2⁰. To represent any ofthe numbers from 1 to 256, the switches are turned on and off so thatthe sum of the binary powers represents the original decimal number, asshown in Table 6 below: TABLE 6 2⁸ 2⁷ 2⁶ 2⁵ 2⁴ 2³ 2² 2¹ 2⁰  1 Off OffOff Off Off Off Off Off Off  2 Off Off Off Off Off Off Off On Off  3 OffOff Off Off Off Off Off On On  4 Off Off Off Off Off Off On Off Off . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Off On On OnOn On On On On 256 On Off Off Off Off Off Off Off Off

[0203] An analogous representation of these numbers can be achieved withmass label molecules where each switch in the register is represented bythe presence or absence of a particular molecule. Thus to identify a4-mer one could label the 4-mer with the mixture of labels thatrepresents the number that identifies that 4-mer, e.g. if AACG isrepresented by the number 7, it can be identified by labelling the 4-merwith a mixture of a molecule that represents 2² with molecules thatrepresent 2¹ and 2⁰.

[0204] Practically speaking, these molecules can be represented as aseries of molecules based on a core molecule substituted with differentnumbers of a particular substituent or isotope, e.g. different numbersof a mass adjuster moiety, such as fluorine atoms or different deuteriumisotopes. Thus 2⁰ can be represented by the core molecule with nofluorine substituents, 2¹ can be represented by the core molecule with 1fluorine substituent and similarly 2⁸ can be represented by the coremolecule with 8 fluorine substituents. When these molecules are analysedby mass spectrometry, they can be combined with a complementarycomponent to give 9 isobaric tags which can be analysed in a tandeminstrument.

[0205] Thus, in the case of the 4-mer AACG, this oligo can be labelledwith labels 0, 1 and 2 from the labels shown above. Clearly all otherpossible 4-mers can be represented in this binary fashion and requireonly 8 basic labels to identify them such as shown in FIG.

[0206] DNA Sequencing Using SRM

[0207] The analysis of Sanger Sequencing Ladders can be effectedefficiently using mass labels of the form discussed above. ConventionalDNA sequencing according to the Sanger methodology uses a DNA polymeraseto add numerous dideoxy/deoxynucleotides to an oligonucleotide primer,annealed to a single stranded DNA template, in a template specificmanner. Random termination of this process is achieved when terminatingnucleotides, i.e. the dideoxynucleotides, are incorporated into thetemplate complement. A “DNA ladder” is produced when the randomlyterminated strands are separated on a denaturing polyacrylamide gel orin a capillary. Sequence information is gathered, generally usingpolyacrylamide gel electrophoresis to separate the terminated fragmentsby length, followed by detecting the “DNA ladder”. In conventionalsemi-automated and automated DNA sequencers, such as the ABI377 fromPerkin Elmer or MegaBACE from Molecular Dynamics, fluorescent labels F₁,F₂, F₃, F₄ are used to identify the four terminating bases A, C, G, T.either through incorporating the fluorescent label into one of theterminating nucleotides or the primer used in the reaction. This ladderis then read by looking for the four dyes passing a detector which scansthe gel or a capillary. Other fluorescent detection formats arepossible.

[0208] Sequencing a Single Template

[0209] In the mass spectral method, the fluorescent labels are exchangedfor mass labels (e.g. M₀X₄; M₁X₃; M₂X₂; M₃X₁; M₄X₀ shown in FIG. 3). Thedideoxy terminator for Adenine is now labelled with M₁X₃. Similarly thedideoxy terminator for Cytosine is now labelled with M₂X₂, theterminator for Guanosine is labelled with M₃X₁ and the terminator forThymidine is labelled with M₄X₀. As the bands elute from the capillary,they are sprayed, in-line, into the ion source of a suitable tandem massanalyser such as a triple quadrupole where the mass marked nucleic acidsare analysed according to an aspect of this invention. Typically, in theion source the labels are cleaved from the terminating base of eachfragment in the ladder and enter the first mass analyser, Q1 of a triplequadrupole. Q1 is set to gate only molecular ions of MX, while Q3 is setto look for labels X₀ through X₄. It may be desirable that one of themasses, say X₄, should be used as an internal standard, viz.; it isalways present and X₀, X₁, X₂, and X₃ are examined in relation to X₄.

[0210] In an alternative approach, the four terminating nucleotides canbe labelled with the four labels shown in FIG. 4 so that the dideoxyterminator for Adenine is now labelled with M₁X₀. Similarly the dideoxyterminator for Cytosine is now labelled with M₂X₀, the terminator forGuanosine is labelled with M₃X₀ and the terminator for Thymidine islabelled with M₄X₀. The label M₀X₀ can be used as an internal standardif desired. In this embodiment, Q1 of a triple quadrupole is set to gatemolecular ions of labels M₄X₀ through M₀X₀, while Q3 is set to look forlabels X₀.

[0211] In addition to nucleotide terminator labelled sequencing, primerlabelled sequencing can be performed. Details of primer labelledsequencing, which can employ the mass labels of the present inventionare provided in PCT/GB98/02048.

[0212] Multiplexed Sequencing of Templates With Mass Labels

[0213] The mass labels of this invention permit more than one templateto be analysed simultaneously, since many more than four labels can bedeveloped. This means that multiple sets of four labels can be generatedto permit analysis of multiple templates according to the methodologydescribed above which is based on the methods devised originally bySanger. Details regarding the multiplexed sequencing of nucleic acidtemplates which can employ the mass labels of the present invention areprovided in PCT/GB98/02048.

[0214] Mass labels of the form shown in FIG. 6 or FIG. 7 can be used tomultiplex the analysis of multiple DNA sequences. Each set of fivelabels, resolvable in the mass spectrometer from every other set by adistinct mass series modifier, can be used to identify a singletemplate, with a spare label remaining for use as a size/quantitystandard if desired. However, sets of 4 labels are sufficient forsequencing, and size standards are not essential. It is thus possible,for example, to use the array of 20 labels shown in FIG. 6 to analysethe Sanger reaction products of 5 templates simultaneously.

[0215] Gene Expression Profiling

[0216] Various methods of analysing populations of complementary DNAderived from poly-adenylated messenger RNA have been developed. A numberof these methods are based on detecting different sized amplification orrestriction products by electrophoretic separations of amplified cDNAlibraries. In general these techniques are based on generatingcharacteristic restriction fragments or amplification products from themembers of a complementary DNA (cDNA) library derived frompoly-adenylated messenger RNA.

[0217] Differential Display (Laing and Pardee, Science 257, 967-971,1992) is the classical method of electrophoresis based gene expressionprofiling. Developments of the concepts of this technique have been maderesulting in improved successors to this technique. Expression profilingmethods based on “molecular indexing” using type IIS or type IPrestriction endonucleases such as Sibson (PCT/GB93/0145) or Kato (EP 0735 144 A1) are examples of one class of successors. In particular WO98/48047 discloses a molecular indexing method based on capillaryelectrophoresis mass spectrometry (CEMS).

[0218] In this method cDNAs are synthesised using anchored andbiotinylated poly-thymidine primers, which ensure that all cDNAs areterminated with a short poly-A tail of fixed length. In an “anchoredprimer” cDNA preparation, poly-A carrying mRNAs are captured and primedusing an oligonucleotide of about 18 deoxythymidine residues with one ofthe three remaining bases at the 3′ end to anchor the primer at the endof the poly-A tract. Biotinylation of the primers allows the cDNAs to beimmobilised on an avidinated solid phase support. These captured cDNAsmay be cleaved with an ordinary type II restriction endonuclease. Thisleaves a 3′ terminal restriction fragment on the solid support whileother fragments are washed away. An adapter is ligated to the resultingknown sticky-end. The adapter is designed to carry the binding site fora type IIs restriction endonuclease. These enzymes bind their targetsequence but cleave the underlying DNA at a defined number of bases awayfrom the binding site. Certain of these enzymes produce a staggered cut;fok1 for example will generate an ambiguous 4 bp sticky-end. If apopulation of cDNAs is treated with such an enzyme the sticky end willbe exposed at the adaptered terminus of each cDNA in the population. Afamily of adapter molecules is used to probe those 4 exposed bases. Witha 4 bp ambiguous sticky-end there are 256 possible candidates. Toidentify the probes, they are tagged with mass labels using a cleavablelinker, so that a unique mass label identifies each of the 256 possible4 bp adapters. This results in a population of fragments with varyinglengths according to where the ordinary type II restriction endonucleasecut them and with one of 256 possible mass labelled adapters at the 5′terminus of the cDNA.

[0219] The mass labelled 3′ restriction fragments are then separated onthe basis of their length, using capillary electrophoresis, followed byanalysis of the mass labels ligated to the termini of the cDNAfragments. The CE column feeds directly into an electrospray massspectrometer or equivalent mass spectrometer. On ionisation in the massspectrometer the labels cleave from their associated restrictionfragments. The quantity of each mass label present in each band,corresponding to a different restriction fragment length, eluting fromthe capillary electrophoresis column is determined. This process gives asignature for each cDNA that can be used to search a database.

[0220] This technique preferably uses 256 mass labels. Usingconventional approaches to mass labelling would result in an array ofmass tags, separated by about 4 daltons, spanning a mass range of morethan a thousand daltons. It is unlikely that an array of such labelscould be generated where all the tags had the same effect on themobility of the associated cDNA restriction fragments. This would meanthat complex correction algorithms would have to be used to account fordifferences in mobility and allow accurate determination of fragmentlength. The mass markers and associated mass labels of this invention,however, are eminently suitable in the above method, for generatingarrays of mass markers whose effect on the mobility of associatedanalyte molecules is the same allowing direct determination of fragmentlength with high sensitivity and excellent signal to noise ratios.

[0221] A second class of electrophoretic techniques is based on the useof ordinary type II restriction endonucleases, which are used tointroduce primer sequences into cDNA restriction fragments. PCRamplification with labelled primers leads to the generation of distinctrestriction fragments, which can be used to identify their associatedmRNA. Such methods include that described in U.S. Pat. No. 5,712,126which discloses a method of introducing adapters into restrictionendonuclease digested cDNA fragments which allow selective amplificationand labelling of 3′ terminal cDNA fragments. Similarly, WO 99/02727,discloses a method of amplifying 3′ terminal restriction fragments usingsolid phase supports and PCR primers which probe the unknown sequenceadjacent to a known restriction site. In this technology cDNAs areprepared using biotinylated anchor primers which ensures that all cDNAsare terminated with a short poly-A tail of fixed length and can beimmobilised on a solid phase substrate. The poly-T primer mayadditionally carry a primer sequence at its 5′ terminus. The capturedcDNAs are then cleaved with an ordinary type II restrictionendonuclease. An adapter is ligated to the resulting known sticky-end.The adapter is designed to carry a primer sequence. The resulting doublestranded construct is then denatured. The strand that is not immobilisedcan be washed away if desired. A family of primers complementary to theadapter primer with an overlap of 4 bases into the unknown sequenceadjacent to the adapter primer is added to the denatured mixture. With a4 base overlap there are 256 possible primers. To identify the probes,they are tagged with mass labels using a cleavable linker, so that eachof the 256 possible 4 bp overlaps is identified by a label that isuniquely identifiable in a mass spectrometer. This results in apopulation of fragments with varying lengths according to where theordinary type II restriction endonuclease cut them and with one of 256possible mass labelled primers at the 5′ terminus of the cDNA. The cycleof denaturing and primer extension can be performed as many times asdesired. If only the adapter primer sites are used, a linearamplification can be performed. This causes smaller distortion of cDNAquantification than exponential amplification. If exponentialamplification is desired then the poly-T oligos used to trap the mRNAsmust carry a primer site as well. Exponential amplification may bedesirable if small tissue samples must be analysed despite the potentialfor distortions of cDNA frequencies.

[0222] Again, the mass labelled 3′ restriction fragments are separatedon the basis of the length, using capillary electrophoresis, of therestriction fragments followed by analysis of the mass labels at thetermini of the cDNA fragments. This technique, like that disclosed in WO98/48047 is preferably practised with 256 mass labels and would thusbenefit in the same way from the advantageous features of the masslabels of this invention.

[0223] Thus, in a further aspect of this invention, there is provided amethod of analysis comprising the steps of:

[0224] 1. Providing a population of mass labelled nucleic acid fragmentsof different lengths, where the mass labels are indicative of a featureof the labelled nucleic acids.

[0225] 2. Separating the labelled fragments on the basis of their size

[0226] 3. Detaching the mass labels from the labelled fragments

[0227] 4. Detecting the mass labels in a mass spectrometer.

[0228] In certain embodiments of this aspect of the invention, the assaydetermines the sequence of nucleic acid or a series of nucleic acids. Insequencing embodiments based on the generation of Sanger ladders themass label identifies the terminating nucleotide of each fragment andeach fragment is identified by a set of four labels. In Sangersequencing embodiments the labels are introduced as mass labelledprimers or labelled terminating nucleotides.

[0229] In other embodiments of this aspect of the invention, the assayis used to determine the identity and quantity of expressed RNAmolecules. In preferred embodiments, the mass labelled nucleic acids aregenerated according to the methods disclosed in WO 98/48047 or WO99/02727. In embodiments using these methods mass labels are introducedinto the nucleic acid fragments by ligation of mass labelled adapters orby extension of mass labelled primers, respectively. For one of ordinaryskill in the art, it should be clear that other methods of geneexpression profiling based on the size of nucleic acid fragments, suchas those disclosed in PCT/GB93/0145, EP-A-0 735 144 or U.S. Pat. No.5,712,126, can be adapted for use with the labels of this invention.

[0230] In preferred embodiments of this invention the step of separatingthe analytes on the basis of size is carried out using capillaryelectrophoresis or high performance liquid chromatography, using, forexample, systems such as those provided by Transgenomic, Inc. (San Jose,Calif., USA.) and disclosed in U.S. Pat. No. 5,585,236, U.S. Pat. No.5,772,889 and other applications. Preferably the separation is performedon-line with a mass spectrometer.

[0231] In preferred embodiments, the step of detaching the mass labelsfrom their associated analytes takes place within the ion source of themass spectrometer. Linkers that allow a mass label to be easily cleavedfrom its associated analyte in a mass spectrometer ion source aredisclosed in PCT/GB98/00127. Compounds that improve the sensitivity ofdetection of a mass label by mass spectrometry are disclosed inPCT/GB98/00127.

[0232] For one of ordinary skill in the art, it should be clear thatother sizing assays could be adapted for use with the mass labels ofthis invention, including, for example, the multiplexed genotyping assaydisclosed by Grossman P. D. et al. in Nucleic Acids Research Oct. 25,1994;22(21):4527-34. This assay would benefit greatly from the abilityto multiplex to higher orders and still resolve the size of fragmentseasily.

[0233] Protein Expression Profiling and 2-Dimensional GelElectrophoresis

[0234] Techniques for profiling proteins, that is to say cataloguing theidentities and quantities of all the proteins expressed in a tissue, arenot well developed in terms of automation or throughput. The classicalmethod of profiling a population of proteins is by two-dimensionalelectrophoresis (R. A. Van Bogelen., E. R. Olson, “Application oftwo-dimensional protein gels in biotechnology”, Biotechnol. Ann. Rev.,1:69-103, 1995). In this method a protein sample extracted from abiological sample is separated on a narrow gel strip. This firstseparation usually separates proteins on the basis of their iso-electricpoint. The entire gel strip is then laid against one edge of arectangular gel, such as a polyacrylamide gel. The separated proteins inthe strip are then electrophoretically separated in the second gel onthe basis of their size, e.g. by Sodium Dodecyl Sulphate PolacrylamideGel Electrophoresis (SDS PAGE). This methodology is slow and verydifficult to automate. It is also relatively insensitive in its simplestincarnations. Once the separation is complete the proteins must bevisualised. This typically involves staining the gel with a reagent thatcan be detected visually or by fluorescence. Radiolabelling andautoradiography are also used. In other methods fluorescent dyes may becovalently linked to proteins in a sample prior to separation. Covalentaddition of a dye can alter the mobility of a protein and so this issometimes less preferred, particularly if comparisons are to be madewith public databases of 2-dimensional gel images. Having visualised theproteins in a gel it is usually necessary to identify the proteins inparticular spots on the gel. This is typically done by cutting the spotsout of the gel and extracting the proteins from the gel matrix. Theextracted proteins can then be identified by a variety of techniques.Preferred techniques involve digestion of the protein, followed bymicrosequencing. A number of improvements have been made to increaseresolution of proteins by 2-D gel electrophoresis and to improve thesensitivity of the system. One method to improve the sensitivity of 2-Dgel electrophoresis and its resolution is to analyse the protein inspecific spots on the gel by mass spectrometry (Jungblut P., Thiede B.“Protein identification from 2-D gels by MALDI mass spectrometry”, MassSpectrom. Rev. 16, 145-162, 1997). One such method is in-gel trypticdigestion followed by analysis of the tryptic fragments by massspectrometry to generate a peptide mass fingerprint. If sequenceinformation is required, tandem mass spectrometry analysis can beperformed.

[0235] At present 2-D analysis is a relatively slow “batch” process. Itis also not very reproducible and it is expensive to analyse a gel.Since most of the costs in a gel based analysis are in the handling ofeach gel it would be desirable to be able to multiplex a number ofsamples on a 2-D gel simultaneously. If it were possible to label theproteins in different samples with a different, independently detectabletag, then the proteins in each sample could be analysed simultaneouslyon the same gel. This would be especially valuable for studies where itis desirable to follow the behaviour of the same proteins in aparticular organism at multiple time points, for example in monitoringhow a bacteria responds to a drug over a predetermined time course.Similarly comparing biopsy material from multiple patients with the samedisease with corresponding controls would be desirable to ensure thatthe same protein from different samples would end up at the same spot onthe gel. Running all the samples on the same gel would allow differentsamples to be compared without having to be concerned about thereproducibility of the separation of the gel. To achieve this requires aseries of labels whose effect on the mobility of the proteins indifferent samples will be the same, so that a particular protein whichis labelled with a different label in each sample will still end up atthe same position in the gel irrespective of its label.

[0236] More recently attempts have been made to exploit massspectrometry to analyse whole proteins that have been fractionated byliquid chromatography or capillary electrophoresis (Dolnik V. “Capillaryzone electrophoresis of proteins”, Electrophoresis 18, 2353-2361, 1997).In-line systems exploiting capillary electrophoresis mass spectrometryhave been tested. The analysis of whole proteins by mass spectrometry,however, suffers from a number of difficulties. The first difficulty isthe analysis of the complex mass spectra resulting from multipleionisation states accessible by individual proteins. The second majordisadvantage is that the mass resolution of mass spectrometers is atpresent quite poor for high molecular weight species, i.e. for ions thatare greater than about 4 kilodaltons in mass so resolving proteins thatare close in mass is difficult. A third disadvantage is that furtheranalysis of whole proteins by tandem mass spectrometry is difficult asthe fragmentation patterns for whole proteins are extremely complex anddifficult to interpret.

[0237] PCT/GB98/00201 and PCT/GB99/03258, describe methods ofcharacterising complex mixtures of proteins by isolating C-terminalpeptides from the proteins in the mixtures and analysing them by massspectrometry. The methods described can be used to determine whetherproteins are present or absent in a sample but would not givecomparative data between samples. The methods do not describe techniquesfor analysis of multiple samples simultaneously, which would benecessary for quantitative comparison of protein expression levels inmultiple samples.

[0238] EP-A-0 594 164 describes a method of isolating a C-terminalpeptide from a protein in a method to allow sequencing of the C-terminalpeptide using N-terminal sequencing reagents. In this method the proteinof interest is digested with an endopeptidase which cleaves at theC-terminal side of lysine residues. The resultant peptides are reactedwith DITC polystyrene, which reacts with all free amino groups.N-terminal amino groups that have reacted with the DITC polystyrene canbe cleaved with trifluoroacetic acid (TFA) thus releasing the N-terminusof all peptides. The epsilon-amino group of lysine is not cleaved,however, and all non-terminal peptides are thus retained on the supportand only C-terminal peptides are released. According to this document,the C-terminal peptides are recovered for micro-sequencing.

[0239] Nature Biotechnology 17: 994-999 (1999) discloses the use of“isotope encoded affinity tags” for the capture of peptides fromproteins to allow protein expression analysis. In this article, theauthors describe the use of a biotin linker, which is reactive to thiolsto capture peptides with cysteine in them. A sample of protein from onesource is reacted with the biotin linker and cleaved with anendopeptidase. The biotinylated cysteine containing peptides can then beisolated on avidinated beads for subsequent analysis by massspectrometry. Two samples can be compared quantitatively by labellingone sample with the biotin linker and labelling the second sample with adeuterated form of the biotin linker. Each peptide in the samples isthen represented as a pair of peaks in the mass spectrum where therelative peak heights indicate their relative expression levels.

[0240] The method in this paper has a number of limitations. Of thevarious limitations to this “isotope encoding” method, the first is thereliance on the presence of thiols in a protein—many proteins do nothave thiols while others have several. In a variation on this method,linkers may be designed to react with other side chains such as amines,but since many proteins contain more than one lysine residue, multiplepeptides per protein will be isolated in this approach. It is likelythat this would not reduce the complexity of the sample sufficiently foranalysis by mass spectrometry. A sample that contains too many speciesis likely to suffer from “ion suppression” in which certain speciesionise preferentially over other species which would normally appear inthe mass spectrum in a less complex sample. In general, capturingproteins by their side chains may give either too many peptides perprotein or certain proteins will be missed altogether.

[0241] The second limitation of this approach is in the method used tocompare the expression levels of proteins from different samples.Labelling each sample with a different isotope variant of the affinitytag results in an additional peak in the mass spectrum for each peptidein each sample, which means that if two samples are analysed togetherthere will be twice as many peaks in the spectrum. Similarly, if threesamples are analysed together, the spectrum will be three times morecomplex than for one sample alone. It might be feasible to attempt thecomparison of two or three samples by this approach but this may well bethe limit as the ever increasing numbers of peaks will increase thelikelihood that two different peptides will have overlapping peaks inthe mass spectrum.

[0242] A further limitation reported by the authors of the above paperis the mobility change caused by the tags. The authors report thatpeptides labelled with the deuterated biotin tag elute slightly afterthe same peptide labelled with the undeuterated tag.

[0243] In view of the above, a further aim of the present invention itis an to provide an improved method of determining the identity andrelative quantities of polypeptides in a number of samples of complexpolypeptide mixtures simultaneously. It is a further aim of this aspectof the invention to ensure that all proteins are represented in theanalysis. It is also an aim of this aspect of the invention to providemass labels and techniques which allow multiple samples to be analysedsimultaneously and quantitatively without increasing significantly thecomplexity of the mass spectrum when compared to the spectrum that wouldbe obtained from a single sample alone. It is a final aim of this aspectof the invention to provide labels which have the same effect on themobility of the labelled peptide, so that samples of the same peptidelabelled with different tags will co-elute after a chromatographicseparation.

[0244] Thus, a further preferred embodiment of this invention provides amethod of analysing a protein sample containing more than one protein,the method comprising the steps of:

[0245] 1. Labelling peptides, polypeptides and/or proteins in the samplewith at least one discretely resolvable mass label from the sets andarrays of this invention, such that each peptide, polypeptide and/orprotein is labelled with a label or combination of labels unique to thatprotein

[0246] 2. Analysing the labelled peptides, polypeptides and/or proteinsby mass spectrometry, preferably according to an aspect of thisinvention e.g. tandem mass spectrometry, to detect the labels attachedto the proteins. The labelled peptides in the sample may then beidentified and their relative expression levels determined.

[0247] It is preferred that multiple samples are subjected to the aboveprocess. It is further preferred that for each of a number of samples,prior to labelling step (1) above, peptides are isolated frompolypeptides in the mixture using a cleavage agent, especially asequence specific cleavage agent. After labelling step (1), the samplesmay be pooled, if desired. Optionally, after labelling step (1) and/orpooling the samples, the peptides polypeptides and/or proteins in thesample or samples may be separated, by gel electrophoresis, iso-electricfocusing, liquid chromatography or other appropriate means, preferablygenerating discrete fractions. These fractions may be bands or spots ona gel or liquid fractions from a chromatographic separation. Fractionsfrom one separation may separated further using a second separationtechnique. Similarly further fractions may be fractionated again untilthe proteins are sufficiently resolved for the subsequent analysissteps.

[0248] This aspect of the invention thus provides a further applicationof the labels and methods of this invention described above. A set orarray of labels of the present invention can be used to increase thethroughput of a 2-D gel electrophoresis analysis of the proteins in anorganism. Each of the mass labels alters the mobility of its associatedprotein in the same way but is still independently detectable. In knownuses of mass spectrometry to analyse proteins from a 2-D gel, such aspeptide mass fingerprinting, it is required that the proteins beextracted from the gel and be purified to remove detergents such as SDSand other contaminants from the gel. The labels of this invention allowrelatively unpurified extract of proteins from the gel to be introduceddirectly into the mass spectrometer and the associated labels can thenbe identified by the methods of this invention in a background ofcontaminating material.

[0249] In a particularly preferred embodiment of this aspect of theinvention multiple samples are subjected to the following process:

[0250] 1. for each of a number of samples, isolating peptides frompolypeptides in the mixture using sequence specific cleavage reagents;

[0251] 2. labelling the isolated peptides in each sample with the labelsof this invention such that each sample is identified by a unique label;

[0252] 3. pooling the labelled samples;

[0253] 4. optionally separating the pooled and labelled peptideschromatographically or electrophoretically;

[0254] 5. analysing these labelled samples by tandem mass spectrometryto identify the labelled peptides in the sample and determine theirrelative expression levels.

[0255] Another preferred embodiment of this aspect of the inventionprovides a method of analysing a series of protein samples each samplecontaining more than one protein, the method comprising the steps of:

[0256] 1. Covalently reacting the proteins of each of the samples withat least one discretely resolvable mass label from the sets and arraysof this invention, such that the proteins of each sample are labelledwith one or more mass labels that are different from the labels reactedwith the proteins of every other sample.

[0257] 2. Pooling the mass labelled samples.

[0258] 3. Separating the pooled samples by gel electrophoresis,iso-electric focusing, liquid chromatography or other appropriate meansto generate discrete fractions. These fractions may be bands or spots ona gel or liquid fractions from a chromatographic separation. Fractionsfrom one separation may separated further using a second separationtechnique. Similarly further fractions may be fractionated again untilthe proteins are sufficiently resolved for the subsequent analysissteps.

[0259] 4. Analysing the fractions by mass spectrometry, preferablyaccording to an aspect of this invention, to detect the labels attachedto the proteins.

[0260] A still further preferred embodiment of this aspect of thepresent invention provides a method of identifying a protein in a samplecontaining more than one protein, the method comprising the steps of:

[0261] 1. Covalently reacting the proteins of the sample with at leastone discretely resolvable mass label from the sets and arrays of thisinvention.

[0262] 2. Separating the proteins by gel electrophoresis, iso-electricfocusing, liquid chromatography or other appropriate means to generatediscrete fractions. These fractions may be bands or spots on a gel orliquid fractions from a chromatographic separation. Fractions from oneseparation may separated further using a second separation technique.Similarly further fractions may be fractionated again until the proteinsare sufficiently resolved for the subsequent analysis steps.

[0263] 3. Digesting the proteins in the fraction with a sequencespecific cleavage reagent.

[0264] 4. Optionally reacting the proteins in the sample with anadditional mass label

[0265] 5. Analysing the digested fractions by liquid chromatography massspectrometry where the elution time of mass marked peptides from theliquid chromatography column step is determined by detecting the masslabels attached to the peptides. A mass spectrometry analysis isperformed, preferably according to an aspect of this invention, todetect the labels attached to the proteins.

[0266] 6. Comparing the elution profile of the labelled peptides fromthe liquid chromatography mass spectrometry analysis of step 5 withprofiles in a database to determine whether the protein has beenpreviously identified.

[0267] A yet further preferred embodiment of this aspect of the presentinvention provides a method of identifying a protein from a series ofprotein samples each sample containing more than one protein, the methodcomprising the steps of:

[0268] 1. Covalently reacting the proteins of each of the samples withat least one discretely resolvable mass label from the sets and arraysof this invention, such that the proteins of each sample are labelledwith one or more mass labels that are different from the labels reactedwith the proteins of every other sample.

[0269] 2. Pooling the mass labelled samples.

[0270] 3. Separating the proteins by gel electrophoresis, iso-electricfocusing, liquid chromatography or other appropriate means to generatediscrete fractions. These fractions may be bands or spots on a gel orliquid fractions from a chromatographic separation. Fractions from oneseparation may separated further using a second separation technique.Similarly further fractions may be fractionated again until the proteinsare sufficiently resolved for the subsequent analysis steps.

[0271] 4. Digesting the proteins in the fraction with a sequencespecific cleavage reagent to generate characteristic peptides for eachprotein in the sample.

[0272] 5. Optionally reacting the proteins in the sample with anadditional mass label.

[0273] 6. Analysing the digested fractions by liquid chromatography massspectrometry where the elution time of mass marked peptides from theliquid chromatography column step is determined by detecting the masslabels attached to the peptides. A mass spectrometry analysis isperformed, preferably according to an aspect of this invention, todetect the labels attached to the proteins.

[0274] 7. Comparing the elution profile of the labelled peptides fromthe liquid chromatography mass spectrometry analysis of step 6 withprofiles in a database to determine whether the protein has beenpreviously identified.

[0275] Step 1 of the above preferred embodiments of this inventioninvolves covalently reacting a mass label of this invention to thereactive side chains of a population of proteins. It is well known inthe art that the reactive side-chain functionalities can be selectivelyreacted. Reactive side-chains include lysine, serine, threonine,tyrosine and cysteine. Cysteine is often cross-linked with itself toform disulphide bridges. For the purposes of this invention it is notessential that these bridges be broken but cysteine side chains can behighly reactive and may be readily reacted with a variety of reagents.If disulphide bridges are present, these can be broken by reducing thedisulphide bridge to a pair of thiols with mercaptethanol. Thiols can beselectively capped by iodoacetate (Aldrich) under mildly basicconditions which promote the formation of a thiolate ion (Mol.Microbiol. 5: 2293, 1991). An appropriate mild base is a carbonate. Forthe proposes of this invention, a mass label of this invention whosereactive functionality is an iodoacetyl group can be reacted with thethiols of an analyte protein. In other embodiments the population ofproteins may be treated with a mass marker whose reactive functionalityis an isocyanate group. Isocyanates will react almost exclusively withthe alpha-amino group at the N-terminus of the proteins and with anylysine epsilon-amino groups, i.e. with primary amines under mildconditions, i.e. at room temperature in a neutral solvent to give a ureaderivative. These reagents can also be made to react with any hydroxylbearing side-chains, such as serine, threonine and tyrosine side chains,at higher temperatures in the presence of an appropriate catalyst suchas pyridine or a tin compound such as dibutyl stannyl laurate to give aurethane derivative. In an alternative embodiment the population ofproteins can be treated with a mass marker whose reactive functionalityis a silyl group such as chlorosilane. These compounds react readilywith most reactive functional groups. Amine derivatives are not stableunder aqueous conditions and so can be hydrolysed back to the free amineif that is desired. Sulphonyl chlorides can also be used as a reactivegroup on a mass label to selectively react the mass label with freeamines such as lysine. Carboxylic acid side chains could also be reactedwith the labels of this invention although it is usually necessary toactivate these side chains to ensure that they will react. Aceticanhydride is commonly used for this purpose. This forms mixed anhydridesat free carboxylic acids which can then be reacted with a nucleophilicfunctionality such as amine.

[0276] The above specific embodiments are intended only as examplesillustrating preferred methods of selectively reacting side-chainfunctionalities with mass labels. A wide variety of reactive groups areknown in the art and many of these can be used to complete the firststeps of these aspects of this invention. It may also be desirable toreact more than one type of side chain of the proteins in a sample withdifferent mass labels. If multiple samples are to be analysedsimultaneously then two or more labels can be used to label each sample.This allows more information to be derived from each protein to aid inits identification.

[0277] In step 3 and step 4 of the latter two embodiments of this aspectof the invention, the C-terminally modified proteins are then treatedwith a sequence specific cleavage agent. In some embodiments sequencespecific endoproteinases such as trypsin, chymotrypsin, thrombin orother enzymes may be used. Cleavage agents may alternatively be chemicalreagents. These are preferably volatile to permit easy removal ofunreacted reagent. Appropriate chemical cleavage reagents includecyanogen bromide which cleaves at methionine residues and BNPS-skatolewhich cleaves at tryptophan residues (D. L. Crimmins et al., Anal.Biochem. 187: 27-38, 1990).

[0278] In the above preferred embodiments of this aspect of theinvention, the step of fractionating the proteins is preferably effectedby performing 2-dimensional gel electrophoresis, using iso-electricfocusing in the first dimension and SDS PAGE in the second dimension.The gel is then visualised to identify where proteins have migrated toon the gel. The spots can then be excised from the gel and the proteinsare then extracted from the excised gel spot. These extracted proteinsmay then be analysed directly by electrospray mass spectrometry or someother suitable ionisation procedure. Alternatively further fractionationmay be performed in-line with the mass spectrometer such as HPLC massspectrometry.

[0279] In step 3 and step 4 of the latter two preferred embodiments ofthis aspect of the invention, the digested proteins are optionallyreacted with an additional mass label of this invention. This is of moresignificance to the latter preferred embodiment of this invention, wheremultiple samples are analysed simultaneously. Most enzymatic digestionsand some of the chemical cleavage methods leave free amines on theresultant peptides of the digested fractionated proteins which can bereacted with a mass label. This means that the same label will appear onall peptides and can be detected selectively to maximise the sensitivityof this analysis.

[0280] In step 6 and step 7 of the latter two preferred embodiments ofthis aspect of the invention, the elution profile of the peptidesgenerated by digesting the fractionated proteins is used to search apre-formed database to determine whether the proteins have beenpreviously identified. The peptides eluting from the liquidchromatography column into a mass spectrometer may be further analysedby tandem mass spectrometry to determine sequence information which canbe used to identify proteins. Peptide sequence data can be used tosearch a protein sequence database or can translated into nucleic acidsequence data to search nucleic acid sequence databases.

[0281] Isolation of Post-Translationally Modified Peptides

[0282] Carbohydrates are often present as a post-translationalmodification of proteins. These carbohydrates often have carbonylgroups. Carbonyl groups can be tagged allowing proteins bearing suchmodifications to be detected or isolated. Biocytin hydrazide (Pierce &Warriner Ltd, Chester, UK) will react with carbonyl groups in a numberof carbohydrate species (E. A. Bayer et al., Anal. Biochem. 170,271-281, “Biocytin hydrazide—a selective label for sialic acids,galactose, and other sugars in glycoconjugates using avidin biotintechnology”, 1988). Proteins bearing carbohydrate modifications in acomplex mixture can thus be biotinylated. The protein mixture may thenbe treated with an endoprotease, such as trypsin, to generate peptidesfrom the proteins. Biotinylated, hence carbohydrate modified, peptidesmay then be isolated using an avidinated solid support. A series ofsamples may be treated in this way and the peptides obtained may bereacted with the mass labels of this invention, such that peptides fromeach sample bear a mass label or combination of mass labels relatable tothe peptide or peptides from that sample. Preferably peptides from eachsample bear a different mass label. These mass taggedcarbohydrate-bearing peptides may then be analysed by liquidchromatography tandem mass spectrometry.

[0283] A number of research groups have reported on the production ofantibodies, which bind to phosphotyrosine residues in a wide variety ofproteins (see for example A. R. Frackelton et al., Methods Enzymol. 201,79-92, “Generation of monoclonal antibodies against phosphotyrosine andtheir use for affinity purification of phosphotyrosine-containingproteins”, 1991 and other articles in this issue of Methods Enzymol.).This means that a significant proportion of proteins that have beenpost-translationally modified by tyrosine phosphorylation may beisolated by affinity chromatography using these antibodies as theaffinity column ligand.

[0284] These phosphotyrosine binding antibodies can be used in thecontext of this invention to isolate peptides containing phosphotyrosineresidues. Thus proteins in a complex mixture may be treated with asequence specific endopeptidase to generate free peptides. These maythen be passed through an anti-phosphotyrosine antibody column, whichwill retain peptides containing a phosphotyrosine group. A series ofsamples may be treated in this way and the peptides obtained may bereacted with the mass tags of this invention, such that peptides fromeach sample bear a mass label or combination of mass labels relatable tothe peptide or peptides from that sample. Preferably peptides from eachsample bear a different mass label. These mass labelledphosphotyrosine-bearing peptides may then be analysed by liquidchromatography tandem mass spectrometry.

[0285] Isolation of Terminal Peptides From Proteins

[0286] A preferred method of protein expression profiling according tothe present invention is to isolate only one peptide from each proteinin the sample. Provided that the isolated peptide fragment is ofsufficient length, the fragment will be specific to its parent protein.In the first step of this aspect of the present invention, peptides areisolated from each protein in each of a number of samples of complexprotein mixtures. In some embodiments of this aspect it is preferredthat terminal peptides are isolated. Isolation of terminal peptidesensures that at least one and only one peptide per protein is isolated.Methods for isolating peptides from the termini of polypeptides arediscussed in PCT/GB98/00201 and PCT/GB99/03258.

[0287] Thus, this aspect of the present invention provides a method ofprotein profiling, which method comprises:

[0288] (a) treating a sample comprising a population of a plurality ofpolypeptides with a cleavage agent which is known to recognise inpolypeptide chains a specific amino acid residue or sequence and tocleave at a cleavage site, whereby the population is cleaved to generatepeptide fragments;

[0289] (b) isolating a population of peptide fragments bearing as areference terminus the N-terminus or the C-terminus of the polypeptidefrom which they were fragmented, each peptide fragment bearing at theother end the cleavage site proximal to the reference terminus;

[0290] (c) prior to or after isolating the peptide fragments, labellingeach reference terminus of the polypeptides with a mass label, or acombination of mass labels from a set or an array of mass labels of thepresent invention, wherein each reference terminus is relatable to itslabel or combination of labels; and

[0291] (d) determining by mass spectrometry a signature sequence of oneor more of the isolated fragments, which signature sequence is thesequence of a pre-determined number of amino acid residues running fromthe cleavage site; wherein a signature sequence characterises eachpolypeptide.

[0292] An alternative preferred method provided by this aspect of thepresent invention makes use of a second cleavage agent to generatefurther fragments, which may themselves be identified and used tocharacterise their parent polypeptide or protein. This method comprises:

[0293] (a) contacting a sample comprising one or more polypeptides witha first cleavage agent to generate polypeptide fragments;

[0294] (b) isolating one or more polypeptide fragments, each fragmentcomprising the N-terminus or the C-terminus of the polypeptide fromwhich it was fragmented;

[0295] (c) prior to or after isolating the polypeptide fragments,labelling each terminus of the polypeptides with a mass label, or acombination of mass labels from a set or an array of mass labels of thepresent invention, wherein each terminus is relatable to its label orcombination of labels; and

[0296] (d) identifying the isolated fragments by mass spectrometry;

[0297] (e) repeating steps (a)-(d) on the sample using a second cleavageagent that cleaves at a different site from the first cleavage agent;and

[0298] (f) characterising the one or more polypeptides in the samplefrom the fragments identified in steps (d) and (e).

[0299] In both of the above methods, the step of labelling the referencetermini can take place before or after isolating the fragments and canalso take place before the fragments are cleaved from their parentpolypeptides or proteins, if desired.

[0300] Regarding the isolation of peptide fragments, in preferredembodiments of this aspect of the present invention, terminal peptidesmay be isolated from a complex mixture of proteins using a method,comprising the steps of:

[0301] 1. Digesting the complex mixture of proteins completely with aLys-C specific cleavage enzyme, i.e. a reagent that cuts at the peptidebond immediately adjacent to a lysine residue on the C-terminal side ofthat residue.

[0302] 2. Contacting the resultant peptides with an activated solidsupport that will react with free amino groups.

[0303] 3. Optionally reacting the captured peptides with a bifunctionalreagent, which has at least one amine reactive functionality.

[0304] 4. Contacting the captured peptides with a reagent that whichcleaves at the alpha amino groups of each peptide on the support. Allpeptides that are not C-terminal will have a lysine residue covalentlylinking them to the solid support. Thus free C-terminal peptides areselectively released.

[0305] 5. Optionally contacting the released peptides with a secondsolid support that will react with the second reactive functionality ofthe bifunctional reagent used in step 3 to capture any peptides that didnot react properly with the first support. 6. Recovering the peptidesremaining free in solution.

[0306] In preferred embodiments of this method, the proteins in thecomplex mixture are denatured, reduced and treated with a reagent to capthiols in the proteins. Typical protocols involve denaturing theproteins in a buffer at pH 8.5 with a high concentration of guanidinehydrochloride (6-8 M), as a denaturation reagent, in the presence of anexcess of mercaptoethanol or dithiothreitol, as reducing agents, and anexcess of a capping agent such as vinylpyridine.

[0307] In step 1 of this method, the complex mixture of proteins iscompletely digested with a Lys-C specific cleavage enzyme, which may be,for example, endopeptidase Lys-C from Lysobacter enzyrnogenes(Boehringer Mannheim).

[0308] In step 2 of this method, the resultant peptides are contactedwith a solid support that reacts with amines. In preferred embodimentsthe solid phase support is derivitised with an isothiocyanate compound.In one embodiment the peptide population is reacted with isothiocyanatoglass (DITC glass, Sigma-Aldrich Ltd, Dorset, England) in the presenceof a base. This captures all peptides to the support through any freeamino groups.

[0309] Step 3 is optional but preferred. It may be difficult toguarantee that all non-C-terminal peptides will react completely withthe first solid support at both the lysine side-chain amino-group andthe N-terminal alpha amino-group. Peptides that react only through thelysine side-chain amino groups will remain attached to the support insubsequent steps. Peptides that react only through their alpha-aminogroup will be cleaved from the support along with C-terminal peptide.This step allows the non-C-terminal peptides to be distinguished fromC-terminal peptides. In this optional step, the bifunctional reagent maybe N-succinimidyl[4-vinylsulphonyl]benzoate (SVSB from Pierce & WarrinerLtd., Chester, UK). This compound comprises an amine-reactiveN-hydroxysuccinimide ester linked to a thiol-reactive vinyl sulphonemoiety. The compound reacts very easily with amines via the esterfunctionality without reaction of the vinyl sulphone and can beseparately reacted with thiols at a later stage. Thus the SVSB isreacted with any free amines on the support under slightly basicconditions. In the presence of a large excess of the SVSB compound andgiven that the peptides on the support are immobilised, the SVSB willreact with the peptides only through the succinimide functionalityleaving the vinyl sulphone moiety free for further reaction. Inalternative embodiments, any unreacted amines may be reacted with biotincoupled to an amine reactive functionality such as N-hydroxysuccinimide(NHS) biotin (Sigma-Aldrich Ltd, Dorset, England). This allowsincompletely reacted peptides to be captured later on avidinated beadsor on an avidinated resin in an affinity capture column.

[0310] In step 4 of this method the captured peptides are contacted witha reagent that cleaves at the alpha amino groups of each peptide on thesupport. In embodiments where DITC glass is used as the amine reactivesupport, the peptides are treated with an appropriate volatile acid suchas trifluoroacetic acid (TFA) which cleaves the N-terminal amino acidfrom each peptide on the support. All peptides that are not C-terminalwill have a lysine residue covalently linking them to the solid support.Thus free C-terminal peptides are selectively released.

[0311] The optional step 5 is preferred especially if the optional step3 is performed. The non-C-terminal peptides that do not react completelywith the amine reactive support are removed by this step. If SVSB isused to tag non-C-terminal peptides that only reacted through theiralpha-amino groups they will have a reactive functionality availablewhich will allow them to be reacted with a solid support derivitisedwith an appropriate nucleophile, preferably a thiol. If DITC glass isused in step 4, which is preferred, then the peptides may be releasedfrom support using TFA. The released peptides may be recovered astrifluoroacetate salts by evaporating the TFA away. The peptides maythen be resuspended in a buffer with a pH of about 7 or just in anappropriate neutral solvent such as dimethylformamide,dimethylsulphoxide or a mixture of water and acetone. The peptides arethen added to the thiol derivitised support. At pH 7 the remaining vinylfunctionality on the SVSB treated peptides should react almostexclusively with the thiol support rather than with free amines exposedby cleavage of the peptides from the DITC glass support. Thiolderivitised Tentagels are available from Rapp Polymere GmbH (Tübingen,Germany) or a thiol derivitised support can be prepared by incubating asilica gel with 3-mercaptopropyltrimethoxysilane.

[0312] In step 6 of this method the released peptides are recovered. Ifoptional steps 3 and 5 are used the peptides may be present in a varietyof solvents or buffers. These will be selected to be volatile inpreferred embodiments. If the peptides are recovered directly from thefirst support, which is DITC glass in preferred embodiments, then it islikely that the peptides will be in TFA, which is volatile. The peptidesare preferably recovered from these volatile solvents or buffers byevaporating the solvent or buffer. The peptides isolated by this methodwill have a free alpha-amino group available for reaction with thelabels of this invention.

[0313] Labelling Isolated Peptides

[0314] Any of the mass labels of the present invention can be used inthe protein expression profiling embodiments described in this aspect ofthe invention. The mass labels illustrated in FIGS. 22 and 23 areparticularly preferred for use with this invention, especially thisaspect of the present invention. These compounds have a vinyl sulphonereactive group, which will allow these compounds to undergo additionreactions with free amines and thiols. If only one label is desired perpeptide then the proteins in the complex mixtures may be treated withcapping agents prior to cleavage with the sequence specificendopeptidase. Phenyl, ethyl and methyl vinyl sulphone will react withfree amines and thiols capping them while still permitting cleavage bytrypsin of the capped proteins. The epsilon amine residues of lysinewill react with two vinyl sulphone moieties if the vinyl sulphonemoieties are not hindered, particularly ethyl and methyl vinyl sulphone.

[0315] After attachment of the markers these labelled peptides will havea mass that is shifted by the mass of the label. The mass of the peptidemay be sufficient to identify the source protein. In this case only thelabel needs to be detected which can be achieved by selected reactionmonitoring with a triple quadrupole, discussed in more detail below.Briefly, the first quadrupole of the triple quadrupole is set to letthrough ions whose mass-to-charge ratio corresponds to that of thepeptide of interest, adjusted for the mass of the marker. The selectedions are then subjected to collision induced dissociation (CID) in thesecond quadrupole. Under the sort of conditions used in the analysis ofpeptides the ions will fragment mostly at the amide bonds in themolecule. The markers in FIGS. 22 and 23 have an amide bond, whichreleases the terminal pre-ionised portion of the tag on cleavage.Although the tags all have the same mass, the terminal portion isdifferent because of differences in the substituents on either side ofthe amide bond. Thus the markers can be distinguished from each other.The presence of the marker fragment associated with an ion of a specificmass should confirm that the ion was a peptide and the relative peakheights of the tags from different samples will give information aboutthe relative quantities of the peptides in their samples. If the mass isnot sufficient to identify a peptide, either because a number ofterminal peptides in the sample have the same terminal mass or becausethe peptide is not known, then sequence information may be determined byanalysis of the complete CID spectrum. FIG. 24 shows a theoreticalspectrum for two samples of a peptide with the sequenceH₂N-gly-leu-ala-ser-glu-COOH, where each sample is attached to one ofthe labels with the formulae shown in FIG. 23. The spectrum isidealised, as it only shows the b-series fragments and does not showother fragmentations or any noise peaks, however it does illustrate thatthe spectrum is clearly divided into a higher mass region correspondingto peptide fragmentation peaks and a lower mass region corresponding tomass label peaks. If desired, the peptide fragmentation peaks can beused to identify the peptides while the mass tag peaks give informationabout the relative quantities of the peptides.

[0316] Separation of Labelled Peptides by Chromatography orElectrophoresis

[0317] Preferably in this aspect of the invention, in the step prior tomass spectroscopic analysis the labelled terminal peptides are subjectedto a chromatographic separation prior to analysis by mass spectrometry.This is preferably High Performance Liquid Chromatography (HPLC) whichcan be coupled directly to a mass spectrometer for in-line analysis ofthe peptides as they elute from the chromatographic column. A variety ofseparation techniques may be performed by HPLC but reverse phasechromatography is a popular method for the separation of peptides priorto mass spectrometry. Capillary zone electrophoresis is anotherseparation method that may be coupled directly to a mass spectrometerfor automatic analysis of eluting samples. These and other fractionationtechniques may be applied to reduce the complexity of a mixture ofpeptides prior to analysis by mass spectrometry.

[0318] Protein Quantification and Identification by Tandem MassSpectrometry

[0319] In the method of this aspect of the invention, the labelledisolated peptides are analysed by tandem mass spectrometry.

[0320] As discussed earlier tandem mass spectrometers allow ions with apre-determined mass-to-charge ratio to be selected and fragmented, e.g.by collision induced dissociation (CID). The fragments can then bedetected providing structural information about the selected ion. Whenpeptides are analysed by CID in a tandem mass spectrometer,characteristic cleavage patterns are observed, which allow the sequenceof the peptide to be determined. Natural peptides typically fragmentrandomly at the amide bonds of the peptide backbone to give series ofions that are characteristic of the peptide. CID fragment series areusually denoted an, bn, cn, etc. for cleavage at the nth peptide bond,where the charge of the ion is retained on the N-terminal fragment ofthe ion. Similarly, fragment series are denoted x_(n), y_(n), z_(n),etc. where the charge is retained on the C-terminal fragment of the ion.This notation is depicted in the following Scheme 1:

[0321] Trypsin and thrombin are favoured cleavage agents for tandem massspectrometry as they produce peptides with basic groups at both ends ofthe molecule, i.e. the alpha-amino group at the N-terminus and lysine orarginine side-chains at the C-terminus. This favours the formation ofdoubly charged ions, in which the charged centres are at oppositetermini of the molecule. These doubly charged ions produce bothC-terminal and N-terminal ion series after CID. This assists indetermining the sequence of the peptide. Generally speaking only one ortwo of the possible ion series are observed in the CID spectra of agiven peptide. In low-energy collisions typical of quadrupole basedinstruments the b-series of N-terminal fragments or the y-series ofC-terminal fragments predominate. If doubly charged ions are analysedthen both series are often detected. In general, the y-series ionspredominate over the b-series.

[0322] If the isolated peptides used in the method of this invention areC-terminal peptides isolated using DITC glass as discussed above, thepeptides will have a free amine after isolation at their N-terminifacilitating labelling with the labels of this invention. As mentionedabove, these labels may all have the same mass so equivalent peptides ineach sample that is analysed will be shifted in mass by the same amount.CID of these peptides will produce fragments from the labels. Theintensities of the label fragments will allow the relative quantities ofequivalent peptides in each sample to be determined. Covalently linkingthe mass labels of this invention to the N-termini of the isolatedpeptides will shift the masses of the b-series of fragment ions by themass of the label, as long as the charge remains on the label. Since themass of the label used for each sample under analysis is the same, therewill be only one ion series produced for all of the samples as long ascollision induced scission of the labelled peptides takes place in thepeptide backbone. This means that it is possible to identify thelabelled peptides by their fragment ions and for any given peptide therewill be only one fragment series for that peptide, irrespective of thenumber of samples being analysed simultaneously. Fragmentation withinthe labels themselves will produce peaks characteristic of each sample.These peaks will occur in a relatively low mass range (see FIG. 24).With a triple quadrupole instrument, it is preferable to use selectedreaction monitoring to achieve the most sensitive detection of thesepeaks. The relative intensities of these peaks will be indicative of therelative amounts of the source protein, from which the peptide wasderived, in the original samples. In natural peptides, the b-series offragment ions tends to be of lower intensity than the y-series. With anappropriately basic mass label or a “pre-ionised” mass label, comprisingfor example a quaternary ammonium centre, the intensity of the b-seriesof ion fragments may be enhanced. Unfortunately, if C-terminal peptidesare used there is no guarantee that the C-terminal amino acid will bebasic, so the y-series fragment ions may be weak. Determination ofstructural information using the y-series would require that theC-terminus of these peptides carry a basic group or a “pre-ionised”group.

[0323] The analysis of proteins by tandem mass spectrometry,particularly mixtures of proteins, is complicated by the “noisiness” ofthe spectra obtained. Proteins isolated from biological samples areusually contaminated with buffering reagents, denaturants anddetergents, all of which introduce peaks into the mass spectrum. As aresult, there are often more contamination peaks in the spectrum thanpeptide peaks, and identifying peaks that correspond to peptides can bea major problem, especially with small samples of proteins that aredifficult to isolate. As a result various methods are used to determinewhich peaks correspond to peptides before detailed CID analysis isperformed. Triple quadrupole based instruments permit “precursor ionscanning” (see Wilm M. et al., Anal. Chem. 68(3) 527-33, “Parent ionscans of unseparated peptide mixtures” (1996)). The triple quadrupole isoperated in “single reaction monitoring” mode, in which the firstquadrupole scans over the full mass range and each gated ion issubjected to CID in the second quadrupole. The third quadrupole is setto detect only one specific fragment ion, which is usually acharacteristic fragment ion from a peptide such as ammonium ions. Analternative method used with quadrupole/time-of-flight massspectrometers scans for doubly charged ions by identifying ions whichwhen subjected to CID produce daughter ions with higher mass-to-chargeratios than the parent ion. A further method of identifying doublycharged ions is to look for sets of peaks in the spectrum which are only0.5 daltons apart with appropriate intensity ratios which would indicatethat the ions are the same differing only by the proportion of ¹³Cpresent in the molecule.

[0324] By labelling peptides with the mass labels of this invention, anovel form of precursor ion scanning may be envisaged in which peptidepeaks are identified by the presence of fragments corresponding to themass labels of this invention after subjecting the labelled peptides toCID. In particular, the peptides isolated from each sample by themethods of this invention may be labelled with more than one mass label.An equimolar mixture of a “precursor ion scanning” label which is usedin all samples and a sample specific label may be used to label thepeptides in each sample. In this way changes in the level of peptides indifferent samples will not have an adverse effect on the identificationof peptide peaks in a precursor ion scan.

[0325] Having identified and selected a peptide ion, it is preferablysubjected to CID. The CID spectra are often quite complex anddetermining which peaks in the CID spectrum correspond to meaningfulpeptide fragment series is a further problem in determining the sequenceof a peptide by mass spectrometry. Shevchenko et al., Rapid Commun. MassSpec. 11 1015-1024 (1997) describe a further method, which involvestreating proteins for analysis with trypsin in 1:1 16_(O)/18_(O) water.The hydrolysis reaction results in two populations of peptides, thefirst whose terminal carboxyl contains 16_(O) and the second whoseterminal carboxyl contains 18_(O). Thus for each peptide in the samplethe should be a double peak of equal intensity for each peptide wherethe double peak is 2 daltons apart. This is complicated slightly byintrinsic peptide isotope peaks but allows for automated scanning of theCID spectrum for doublets. The differences in mass between doublets canbe determined to identify the amino acid by the two fragments differ.This method may be applicable with the methods of this invention ifN-terminal peptides are isolated.

1 1 1 5 PRT Artificial Sequence Description of Artificial SequenceIllustrative peptide 1 Gly Leu Ala Ser Glu 1 5

1. A set of tow or more mass labels, each label in the set comprising amass marker moiety attached via a cleavable linker to a massnormalisation moiety, wherein the aggregate mass of each label in theset may be the same or different and the mass of the mass marker moietyof each label in the set may be the same or different, and wherein theset comprises a group of labels having a mass marker moiety of a commonmass or the set comprises of a group of labels having a common aggregatemass, and wherein the in any group of labels within the set having acommon aggregate mass each label has a mass marker moiety having a massdifferent from that of all other mass marker moieties in that group suchthat all of the mass labels in the set are distinguishable from eachother by mass spectrometry.
 2. A set of mass labels according to claim1, in which each label in the set comprises a mass marker moiety havinga common mass and each label in the set has a unique aggregate mass. 3.A set of mass labels according to claim 1, in which each label in theset comprises a mass marker moiety having a unique mass and each labelin the set has a common aggregate mass.
 4. A set of mass labelsaccording to claim 3, in which each mass marker moiety in the set has acommon basic structure, and each mass normalisation moiety in the sethas a common basic structure that may be the same or different from thecommon basic structure of the mass marker moieties, and wherein eachmass label in the set comprises one or more mass adjuster moieties, themass adjuster moieties being attached to or situated within the basicstructure of mass marker moiety and/or the basic structure of the massnormalisation moiety, such that every mass marker moiety in the setcomprises a different number of mass adjuster moieties and every masslabel in the set has the same number of mass adjuster moietis.
 5. A setof mass labels according to claim 4, each mass label in the set havingthe following structure: M(A)_(y)-L-X(A)_(z) wherein M is a massnormalisation moiety, X is a mass marker moiety, A is a mass adjustermoiety, L is a cleavable linker, y and z are integers of 0 or greater,and y+z is an integer of 1 or greater.
 6. A set of mass labels accordingto claim 4 or claim 5, wherein the mass adjuster moiety is selectedfrom: (a) an isotopic substituent situated within the basic structure ofthe mass marker moiety and/or within the basic structure of the massnormalisation moiety, and (b) substituent atoms or groups attached tothe basic structure of the mass marker moiety and/or attached to thebasic structure of the mass normalisation moiety.
 7. A set of masslabels according to claim 6, wherein the mass adjuster moiety isselected from a halogen atom substituent, a methyl group and ²H or ¹³Cisotopic substituents.
 8. A set of mass labels according to claim 7,wherein the mass adjuster moiety is a fluorine atom substituent.
 9. Aset of mass labels according to any preceding claim, wherein thecleavable linker attaching the mass marker moiety to the massnormalisation moiety is a linker cleavable by collision.
 10. A set ofmass labels according to claim 9, wherein the cleavable linker comprisesan amide bond.
 11. A set of mass labels according to any precedingclaim, wherein the mass marker moiety and/or the mass normalisationmoiety comprises a fragmentation resistant group.
 12. A set of masslabels according to claim 11, wherein the mass normalisation moietycomprises a phenyl group.
 13. A set of mass labels according to anypreceding claim, wherein the mass marker moiety comprises a pre-ionisedgroup.
 14. A set of mass labels according to claim 13, wherein the massmarker moiety comprises an N-methyl pyridyl group, or a group selectedfrom the following groups: —NH₂, —NR₂, —NR₃ ⁺, —SR₃ ^(+, —SO) ₃ ⁻, —PO₄⁻, —PO₃ ^(−, —CO) ₂ ⁻,

wherein R is hydrogen or is a substituted or unsubstituted aliphatic,aromatic, cyclic or heterocyclic group.
 15. A set of mass labelsaccording to any of claims 5-14, wherein each of the labels in the sethas the following structure:

wherein R is hydrogen or is a substituted or unsubstituted aliphatic,aromatic, cyclic or heterocyclic group; L is a cleavable linker; A is amass adjuster moiety; each p is the same and is an integer of 0 orgreater; each y′ may be the same or different and is an integer of 0-4,the sum of all y′ for any one label being equal to y; each z′ may be thesame or different and is an integer of 0-4, the sum of all z′ for anyone label being equal to z; and y+z is an integer of 1 or greater.
 16. Aset of mass labels according to claim 15, wherein R is H, L is an amidebond, p=0, and A is an F atom.
 17. An array of mass labels, comprisingtwo or more sets of mass labels as defined in any of claims 3-16,wherein the aggregate mass of each of the mass labels of any one set inthe array is different from the aggregate mass of each of the masslabels of every other set in the array.
 18. An array of mass labelsaccording to claim 17, wherein each mass label in at least one setcomprises a mass series modifying group of a common mass, the massseries modifying group in each of the mass labels of any one set havinga different mass from the mass series modifying groups in each of themass labels of every other set in the array.
 19. An array of mass labelsaccording to claim 18, wherein the mass series modifying groups areattached to the mass labels such that, upon cleaving the cleavablelinker of the mass labels, the mass series modifying groups becomedetached from the mass marker moieties.
 20. An array of mass labelsaccording to claim 18 or claim 19, wherein the mass series modifyinggroups of each set in the array have a common basic structure, and eachmass label of any one set in the array has the same number of massseries modifying groups as the other mass labels of that set, and adifferent number of mass series modifying groups from the mass labels ofevery other set in the array.
 21. An array of mass labels according toclaim 20, each mass label in the array having either of the followingstructures: (S)_(x)-M(A)_(y)-L-X(A)_(z) M(A)_(y)-(S)_(x)-L-X(A)_(z)wherein S is a mass series modifying group; M is a mass normalisationmoiety; X is a mass marker moiety; A is a mass adjuster moiety; L is acleavable linker; x is an integer of 0 or greater; y and z are integersof 0 or greater; and y+z is an integer of 1 or greater.
 22. An array ofmass labels according to any of claims 18-21, wherein the mass seriesmodifying groups comprise an aryl ether group.
 23. An array of masslabels according to claim 21 or claim 22, each mass label in the arrayhaving either of the following structures:

wherein R is hydrogen or is a substituted or unsubstituted aliphatic,aromatic, cyclic or heterocyclic group; each p is the same and is aninteger of 0 or greater; x is an integer of 0 or greater, x being thesame for each mass label in any one set of the array, and the x of anyone set being different from the x of every other set in the array; eachy′ may be the same or different and is an integer of 0-4, the sum of ally′ for any one label being equal to y, and each z′ may be the same ordifferent and is an integer of 0-4, the sum of all z′ for any one labelbeing equal to z; and y+z is an integer of 1 or greater.
 24. An array ofmass labels according to claim 18 or claim 19, wherein the mass seriesmodifying groups of every set in the array have a common basicstructure, the mass series modifying group of the mass labels of atleast one set comprising one or more mass adjuster moieties, the massadjuster moieties being attached to or situated within the basicstructure of the mass series modifying group.
 25. An array of masslabels according to claim 24, in which every mass label of every set inthe array has the same number of mass series modifying groups, whereinthe mass series modifying group in each mass label of any one set hasthe same number of mass adjuster moieties as the mass series modifyinggroups in every other label of that set, and wherein the mass seriesmodifying groups in the mass labels of any one set have a differentnumber of mass adjuster moieties from the mass series modifying groupsin the labels of every other set in the array.
 26. An array of masslabels according claim 25, wherein each of the sets in the arraycomprises mass labels having either of the following structures:S(A*)_(r)-M(A)_(y)-L-X(A)_(z) M(A)_(y)-S(A*)_(r)-L-X(A)_(z) wherein S isa mass series modifying group; M is a mass normalisation moiety; X is amass marker moiety; A is a mass adjuster moiety of the mass markermoieties and mass normalisation moieties; A* may be the same ordifferent from A and is a mass adjuster moiety of the mass seriesmodifying groups; L is a cleavable linker; r is an integer of 0 orgreater and is at least 1 for one or more sets of mass labels in thearray; y and z are integers of 0 or greater; and y+z is an integer of 1or greater.
 27. An array of mass labels according claim 26, wherein eachof the sets in the array comprises mass labels having either of thefollowing structures:

wherein R is hydrogen or is a substituted or unsubstituted aliphatic,aromatic, cyclic or heterocyclic group; each p is the same and is aninteger of 0 or greater; x is an integer of 0 or greater x being thesame for all mass labels in the array; each y′ may be the same ordifferent and is an integer of 0-4, the sum of all y′ for any one labelbeing equal !.O y; each z′ may be the same or different and is aninteger of 0-4, the sum of all z′ for any one label being equal to z;y+z is an integer of 1 or greater; each r′ may be the same or different,the sum of all r′ for any one label being equal to r; and r is aninteger of 0 or greater and is at least 1 for one or more sets of masslabels in the array.
 28. An array of mass labels according to any ofclaims 17-27, wherein the mass labels of any one set differ in mass fromthe mass labels of every other set in the array by 4 daltons or more.29. A set of two or more probes, each probe in the set being differentand being attached to a unique mass label or a unique combination ofmass labels, from a set or an array of mass labels as defined in any ofclaims 1-28.
 30. An array of probes comprising two or more sets ofprobes, wherein each probe in any one set is attached to a unique masslabel, or a unique combination of mass labels, from a set of mass labelsas defined in any of claims 1-16, and wherein the probes in any one setare attached to mass labels from the same set of mass labels, and eachset of probes is attached to mass labels from unique sets of mass labelsfrom an array of mass labels as defined in any of claims 17-28.
 31. Aset or array of probes according to claim 29 or claim 30, wherein eachprobe is attached to a unique combination of mass labels, eachcombination being distinguished by the presence and absence of each masslabel in the set of mass labels and/or the quantity of each mass labelattached to the probe.
 32. A set or array of probes according to any ofclaims 29-31, wherein each probe comprises a biomolecule.
 33. A set orarray of probes according to claim 32, wherein the biomolecule isselected from a DNA, an RNA, an oligonucleotide, a nucleic acid base, aprotein and/or an amino acid.
 34. A method of analysis, which methodcomprises detecting an analyte by identifying by mass spectrometry amass label or a combination of mass labels relatable to the analyte,wherein the mass label is a mass label from a set or an array of masslabels as defined in any of claims 1-28.
 35. A method according to claim34, in which two or more analytes are detected by simultaneouslyidentifying their mass labels or combinations of mass labels by massspectrometry.
 36. A method according to claim 34 or claim 35, whereineach analyte is identified by a unique combination of mass labels from aset or array of mass labels, each combination being distinguished by thepresence and absence of each mass label in the set or array and/or thequantity of each mass label.
 37. A method according to any of claims34-36 for identifying two or more analytes, wherein the analytes areseparated according to their mass, prior to detecting the mass label bymass spectrometry.
 38. A method according to claim 37, whereinseparation is carried out by a chromatographic or electrophoreticmethod.
 39. A method according to any of claims 34-38, wherein the massspectrometer employed to detect the mass label comprises one or moremass analysers, which mass analysers are capable of allowing ions of aparticular mass, or range of masses, to pass through for detectionand/or are capable of causing ions to dissociate.
 40. A method accordingto claim 39, wherein ions of a particular mass or range of massesspecific to one or more known mass labels are selected using the massanalyser, the selected ions are dissociated, and the dissociationproducts are detected to identify ion patterns indicative of theselected mass labels.
 41. A method according to claim 39 or claim 40,wherein the mass spectrometer comprises three quadrupole mass analysers.42. A method according to claim 40 or claim 41, wherein a first massanalyser is used to select ions of a particular mass or mass range, asecond mass analyser is used to dissociate the selected ions, and athird mass analyser is used to detect resulting ions.
 43. A methodaccording to any of claims 34-42, which method comprises: (a) contactingone or more analytes with a set of probes, or an array of probes,wherein the probes are as defined in any of claims 29-33, (b)identifying an analyte, by detecting a probe relatable to that analyte.44. A method according to claim 43, wherein the mass label is cleavedfrom the probe prior to detecting the mass label by mass spectrometry.45. A method according to claim 43 or claim 44, which method comprisescontacting one or more nucleic acids with a set of hybridisation probes.46. A method according to claim 45, wherein the set of hybridizationprobes comprises a set of up to 256 4-mers, each probe in the set havinga different combination of nucleic acid bases.
 47. A method oftwo-dimensional mass spectrometric analysis, which method comprises: (a)providing one or more analytes, each analyte being labelled with a masslabel or a combination of mass labels, wherein the mass labels are froma set or an array of mass labels as defined in any of claims 1-28; (b)cleaving the mass labels from the analytes; (c) detecting the masslabels; (d) dissociating the mass labels in the mass spectrometer, torelease the mass marker moieties from the mass normalisation moieties;(e) detecting the mass marker moieties; and (f) identifying the analyteson the basis of the mass spectrum of the mass labels and the massspectrum of the mass marker moieties.
 48. A method according to claim47, wherein in step (c) mass labels of a chosen mass or a chosen rangeof masses are selected for detection, and/or in step (e) mass markermoieties having a specific mass or a specific range of masses areselected for detection.
 49. A method of analysis, which methodcomprises: (a) subjecting a mixture of labelled analytes to a firstseparation treatment on the basis of a first property of the analytes;(b) subjecting the resulting separated analytes to a second separationtreatment on the basis of a second property of the analytes; and (c)detecting an analyte by detecting its label; wherein the analytes arelabelled with a mass label from a set or an array of mass labels asdefined in any of claims 1-28.
 50. A method according to claim 49wherein in step (a) and/or step (b) the analytes are separated accordingto their length or mass.
 51. A method according to claim 49 or 50,wherein in step (a) and/or step (b) the analytes are separated accordingto their iso-electric point.
 52. A method according to any of claims49-51, wherein the analytes comprise a protein, a polypeptide, a peptidean amino acid or a nucleic acid, or fragments thereof.
 53. A method of2-dimensional gel electrophoresis according to any of claims 49-52. 54.A method for characterising nucleic acid, which comprises: (a) providinga population of nucleic acid fragments, each fragment having cleavablyattached thereto a mass label from a set or an array of mass labels asdefined in any of claims 1-28 for identifying a feature of thatfragment; (b) separating the fragments on the basis of their length; (c)cleaving each fragment to release its mass label; and (d) determiningeach mass label by mass spectroscopy to relate the feature of eachfragment to the length of the fragment.
 55. A method according to claim54 for characterising cDNA, which method comprises: (a) exposing asample comprising a population of one or more cDNAs or fragments thereofto a cleavage agent which recognises a predetermined sequence and cutsat a reference site at a known displacement from the predeterminedsequence proximal to an end of each cDNA or fragment thereof so as togenerate a population of terminal fragments; (b) ligating to eachreference site an adaptor oligonucleotide which comprises a recognitionsite for a sampling cleavage agent; (c) exposing the population ofterminal fragments to a sampling cleavage agent which binds to therecognition site and cuts at a sampling site of known displacement fromthe recognition site so as to generate in each terminal fragment asticky end sequence of a predetermined length of up to 6 bases, and ofunknown sequence; (d) separating the population of terminal fragmentsinto sub-populations according to sequence length; and (e) determiningeach sticky end sequence by: (i) probing with an array of labelledhybridisation probes, the array containing all possible base sequencesof the predetermined length; (ii) ligating those probes which hybridiseto the sticky end sequences; and (iii) determining which probes areligated by identification and preferably quantification of the labels;wherein the labels are mass labels from a set or an array as defined inany of claims 1-28.
 56. A method according to claim 55, wherein thepopulation of terminal fragments is separated by capillaryelectrophoresis, HPLC or gel electrophoresis.
 57. A method forcharacterising nucleic acid, which method comprises generating Sangerladder nucleic acid fragments from one or more nucleic acid templates,in the presence of at least one labelled terminating base, andidentifying the length of the fragment, and the terminating base of thefragment, wherein the label is relatable to the terminating base and isa mass label from a set or an array as defined in any of claims 1-28.58. A method according to claim 57, wherein all four terminating basesare present in the same reaction zone.
 59. A method according to claim59 or claim 60, which method comprises generating Sanger ladder nucleicacid fragments from a plurality of nucleic acid templates present in thesame reaction zone, and for each nucleic acid fragment producedidentifying the length of the fragment, the identity of the templatefrom which the fragment is derived and the terminating base of thefragment, wherein prior to generating the fragments, a labelled primernucleotide or oligonucleotide is hybridised to each template, the labelon each primer being specific to the template to which that primerhybridises to allow identification of the template.
 60. A methodaccording to claim 59, wherein the label identifying the template is amass label from a set or an array as defined in any of claims 1-28. 61.A method for sequencing nucleic acid, which method comprises: (a)obtaining a target nucleic acid population comprising one or moresingle-stranded DNAs to be sequenced, each of which is present in aunique amount and bears a primer to provide a double-stranded portion ofthe nucleic acid for ligation thereto; (b) contacting the nucleic acidpopulation with an array of hybridisation probes, each probe comprisinga label cleavably attached to a known base sequence of predeterminedlength, the array containing all possible base sequences of thatpredetermined length and the base sequences being incapable of ligationto each other, wherein the contacting is carried out in the presence ofligase under conditions to ligate to the double-stranded portion of eachnucleic acid the probe bearing the base sequence complementary to thesingle-stranded nucleic acid adjacent the double-stranded portionthereby to form an extended double-stranded portion which is incapableof ligation to further probes; and (c) removing all unligated probes;followed by the steps of: (d) cleaving the ligated probes to releaseeach label; (e) recording the quantity of each label by massspectrometry; and (f) activating the extended double-stranded portion toenable ligation thereto; wherein (g) steps (b) to (f) are repeated in acycle for a sufficient number of times to determine the sequence of theor each single-stranded nucleic acid by determining the sequence ofrelease of each label, wherein the labels of the hybridisation probesare each from a set or an array as defined in any of claims 1-28.
 62. Amethod according to claim 61, wherein the hybridisation probes are a setof 256 4-mers, each probe in the set having a different combination ofnucleic acid bases.
 63. A method for characterising a sample comprisingpeptides, polypeptides and/or proteins, which method comprises: (a)providing a sample comprising peptides, polypeptides and/or proteins,each peptide, polypeptide and/or protein having cleavably attachedthereto a mass label, or a combination of mass labels from a set or anarray of mass labels as defined in any of claims 1-28, wherein eachpeptide, polypeptide and/or protein is relatable to its label orcombination of labels; (b) analysing the labelled peptides, polypeptidesand/or proteins, by mass spectrometry to detect the labels.
 64. A methodaccording to claim 63, wherein the sample is provided by formingpeptides from the action of a cleavage agent on a sample comprisingpolypeptides and/or proteins.
 65. A method according to claim 63 orclaim 64, wherein the labelled peptides, polypeptides and/or proteinsare separated by a chromatographic or electrophoretic method, prior toanalysing.
 66. A method according to any of claims 63-65, wherein aplurality of samples is provided.
 67. A method according to claim 66,wherein the plurality of samples is pooled, prior to analysis.
 68. Amethod according to any of claims 63-67, wherein one or more of thepeptides, polypeptides and/or proteins in the sample ispost-translationally modified and comprises a carbohydrate, and whereinthe method comprises biotinylating the modified peptide, polypeptide orprotein by attaching biotin via a carbonyl group of the carbohydrate.69. A method according to any of claims 63-67, wherein one or more ofthe peptides, polypeptides and/or proteins in the sample ispost-translationally modified by tyrosine phosphorylation, and whereinthe method comprises separating such modified peptides, polypeptidesand/or proteins by affinity chromatography using an anti-phosphotyrosineantibody.
 70. A method according to any of claims 63-69, which methodcomprises isolating a single peptide fragment from each peptide,polypeptide and/or protein.
 71. A method according to claim 70, whereineach isolated fragment is a terminal fragment.
 72. A method according toclaim 71, which method comprises: (a) treating a sample comprising apopulation of a plurality of polypeptides with a cleavage agent which isknown to recognise in polypeptide chains a specific amino acid residueor sequence and to cleave at a cleavage site, whereby the population iscleaved to generate peptide fragments; (b) isolating a population ofpeptide fragments bearing as a reference terminus the N-terminus or theC-terminus of the polypeptide from which they were fragmented, eachpeptide fragment bearing at the other end the cleavage site proximal tothe reference terminus; (c) prior to or after isolating the peptidefragments, labelling each reference terminus of the polypeptides with amass label, or a combination of mass labels from a set or an array ofmass labels as defined in any of claims 1-28, wherein each referenceterminus is relatable to its label or combination of labels; and (d)determining by mass spectrometry a signature sequence of one or more-ofthe isolated fragments, which signature sequence is the sequence of apre-determined number of amino acid residues running from the cleavagesite; wherein a signature sequence characterises each polypeptide.
 73. Amethod according to claim 71, which method comprises: (a) contacting asample comprising one or more polypeptides with a first cleavage agentto generate polypeptide fragments; (b) isolating one or more polypeptidefragments, each fragment comprising the N-terminus or the C-terminus ofthe polypeptide from which it was fragmented; (c) prior to or afterisolating the polypeptide fragments, labelling each terminus of thepolypeptides with a mass label, or a combination of mass labels from aset or an array of mass labels as defined in any of claims 1-28, whereineach terminus is relatable to its label or combination of labels; and(d) identifying the isolated fragments by mass spectrometry; (e)repeating steps (a)-(d) on the sample using a second cleavage agent thatcleaves at a different site from the first cleavage agent; and (f)characterising the one or more polypeptides in the sample from thefragments identified in steps (d) and (e).
 74. Use of a mass label froma set or an array of labels as defined in any of claims 1-28, in amethod of analysis by mass spectrometry.
 75. Use according to claim 74in a method of 2-dimensional electrophoretic analysis.
 76. Use accordingto claim 74 in a method of 2-dimensional mass spectrometric analysis.77. Use according to any of claims 74-76 in a method of sequencing oneor more nucleic acids.
 78. Use according to any of claims 74-76 in amethod of gene expression profiling.
 79. Use according to any of claims74-76 in a method of protein expression profiling.
 80. Use according toany of claims 74-76 in a method of nucleic acid sorting.